First I used DosageConvertor (http://genome.sph.umich.edu/wiki/DosageConvertor) to convert from .vcf to .dose-files.
From the mach2abel documentation it looks like you need .mldose and .mlinfo-files, but the format of my files look exactly the same (I have never managed to find the actual differences between .mldose and .dose).
The .info-file also looks simliar to the one in the example, except I don´t have the rs-numbers and the columns are not the same, but there should be 7 of them..
I even changed the suffix to .mldose and .mlinfo, and in the .mldose-file I changed the second column from DOSE to MLDOSE.
When running mach2databel() I get the following error over and over:
Code: Select all
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> libloc <- "/tsd/p88/home/p88-sigridbo/RPackages"
> repoloc <- "file://tsd/shared/R/cran"
> library(GenABEL, lib=libloc)
Loading required package: MASS
Loading required package: GenABEL.data
> library(DatABEL, lib=libloc)
DatABEL v.0.9-6 loaded
> fvdose <- mach2databel("chr1.short.20ind.mldose", "chr1.test.mlinfo", "new")
Read 3500 items
Options in effect:
--infile = chr1.short.20ind.mldose
--outfile = new
--skiprows = OFF
--skipcols = 2
--cnrow = ON, using data from file './tmp953597'
--rncol = ON, using column 1 of 'chr1.short.20ind.mldose'
--transpose = OFF
--Rmatrix = OFF
--nanString = NA
Reading columns from ./tmp953597: Number of names in column-names file './tmp953597' is 500
Number of lines in source file is 20
Number of words in source file is 502
skiprows = 0
cnrow = 0
skipcols = 2
rncol = 1
Rmatrix = 0
numWords = 502
Creating file with numRows = 20
Creating file with numColumns = 500
Transposing new_fvtmp => new.
Error in `dimnames<-`(`*tmp*`, value = list(c("9477891071_R06C01", "9477888116_R01C02", :
non-unigue names in dim [] (use set_dimnames?)
Calls: mach2databel -> dimnames<- -> dimnames<-
In addition: Warning message:
In uninames(.Object@data) :
uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names
I have looked throught my .mldose-file multiple times, and it looks exactly the same as the test-file (which I am able to convert without any errors), except the actual id-numbers of course. I have tried with both tab-separated and space-separated file. I have even tried to change the id-numbers for a (just to see if that worked), to a simliar format as the test files (id1111123, id1111134 etc.) but it still did not work.
This is how my .mldose-file looks like:
Code: Select all
1->9455456984_RO6C01 MLDOSE 0.745 0.001 0.864 ...etc.
2->9455458937_RO1CO2 MLDOSE 0.869 0.001 0.866 ...etc.
3->9456874354_RO5CO1 MLDOSE 0.912 0.001 0.853 ...etc.
(The file consists of 500 SNPs and 20 individuals)
And this is how my .mlinfo-file looks like:
Code: Select all
SNP REF ALT DoseMAF RefAF AvgCall Rsq
1:10177 A AC 0.45006 0.42532 0.56657 0.01470
1:10235 T TA 0.00041 0.00120 0.99959 0.000002
1:10352 T TA 0.44438 0.43389 0.57208 0.01639
(The file consists of 500 SNPs)
I have just used a subsample of 20 individuals, so I have double checked that no id's are duplicates.
I have also run it on the full chromosome, to make sure there was not an error after extracting the 500 SNPs, but there is the same error.
It seems to be a problem with the .mldose-file, but I am not sure.
Any help would be very much appreciated.