Page 1 of 1

mach2databel - Error in dimnames..

Posted: Wed Feb 08, 2017 1:21 pm
by sigridbo
I have been trying to convert my .vcf-files into file vector format, for probABEL-analyses.
First I used DosageConvertor (http://genome.sph.umich.edu/wiki/DosageConvertor) to convert from .vcf to .dose-files.
From the mach2abel documentation it looks like you need .mldose and .mlinfo-files, but the format of my files look exactly the same (I have never managed to find the actual differences between .mldose and .dose).
The .info-file also looks simliar to the one in the example, except I donĀ“t have the rs-numbers and the columns are not the same, but there should be 7 of them..
I even changed the suffix to .mldose and .mlinfo, and in the .mldose-file I changed the second column from DOSE to MLDOSE.

When running mach2databel() I get the following error over and over:

Code: Select all

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> libloc <- "/tsd/p88/home/p88-sigridbo/RPackages"
> repoloc <- "file://tsd/shared/R/cran"
> .libPaths()
[1] "/net/tsd-evs.tsd.usit.no/p88/home/p88-sigridbo/RPackages"                       
[2] "/net/tsd-evs.tsd.usit.no/p88/home/p88-sigridbo/R/x86_64-pc-linux-gnu-library/3.2"
[3] "/net/p01-c2io-nfs/software/VERSIONS/R-packages/3.2"                             
[4] "/net/p01-c2io-nfs/software/VERSIONS/R/3.2.2/lib64/R/library"                     
> library(GenABEL, lib=libloc)
Loading required package: MASS
Loading required package: GenABEL.data
> library(DatABEL, lib=libloc)
DatABEL v.0.9-6 loaded

>
> fvdose <- mach2databel("chr1.short.20ind.mldose", "chr1.test.mlinfo", "new")
Read 3500 items
Options in effect:
    --infile    = chr1.short.20ind.mldose
    --outfile   = new
    --skiprows  = OFF
    --skipcols  = 2
    --cnrow     = ON, using data from file './tmp953597'
    --rncol     = ON, using column 1 of 'chr1.short.20ind.mldose'
    --transpose = OFF
    --Rmatrix   = OFF
    --nanString = NA
Reading columns from ./tmp953597: Number of names in column-names file './tmp953597' is 500

Number of lines in source file is 20
Number of words in source file is 502
skiprows = 0
cnrow = 0
skipcols = 2
rncol = 1
Rmatrix = 0
numWords = 502
Creating file with numRows = 20
Creating file with numColumns = 500
Transposing new_fvtmp => new.
text2fvf finished.
Error in `dimnames<-`(`*tmp*`, value = list(c("9477891071_R06C01", "9477888116_R01C02",  :
  non-unigue names in dim [[2]] (use set_dimnames?)
Calls: mach2databel -> dimnames<- -> dimnames<-
In addition: Warning message:
In uninames(.Object@data) :
  uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names
Execution halted


I have looked throught my .mldose-file multiple times, and it looks exactly the same as the test-file (which I am able to convert without any errors), except the actual id-numbers of course. I have tried with both tab-separated and space-separated file. I have even tried to change the id-numbers for a (just to see if that worked), to a simliar format as the test files (id1111123, id1111134 etc.) but it still did not work.


This is how my .mldose-file looks like:

Code: Select all

1->9455456984_RO6C01 MLDOSE 0.745 0.001 0.864   ...etc.
2->9455458937_RO1CO2 MLDOSE 0.869 0.001 0.866 ...etc.
3->9456874354_RO5CO1 MLDOSE 0.912 0.001 0.853 ...etc.
etc.


(The file consists of 500 SNPs and 20 individuals)

And this is how my .mlinfo-file looks like:

Code: Select all

SNP      REF   ALT   DoseMAF   RefAF   AvgCall   Rsq
1:10177   A   AC   0.45006   0.42532   0.56657   0.01470   
1:10235   T   TA   0.00041   0.00120   0.99959   0.000002
1:10352   T   TA   0.44438   0.43389   0.57208   0.01639
etc.   


(The file consists of 500 SNPs)

I have just used a subsample of 20 individuals, so I have double checked that no id's are duplicates.
I have also run it on the full chromosome, to make sure there was not an error after extracting the 500 SNPs, but there is the same error.

It seems to be a problem with the .mldose-file, but I am not sure.

Any help would be very much appreciated.

Re: mach2databel - Error in dimnames..

Posted: Thu Feb 09, 2017 8:42 pm
by lckarssen
Thanks for joining the forum. I will try and see if I can reproduce this. I happen to have to convert some imputed data in VCF format to DatABEL files myself. I will get back to you in a couple of days.

Re: mach2databel - Error in dimnames..

Posted: Mon Feb 13, 2017 10:29 am
by sigridbo
Thank you very much!

Another think I could mention is that the .dose-file was on the format:

Code: Select all

9455456984_RO6C01->9455456984_RO6C01 MLDOSE 0.745 0.001 0.864   ...etc.
9455458937_RO1CO2->9455458937_RO1CO2 MLDOSE 0.869 0.001 0.866 ...etc.
9456874354_RO5CO1->9456874354_RO5CO1 MLDOSE 0.912 0.001 0.853 ...etc.
etc.


after conversion with DosageConvertor, but since I got the this dimname-error, I tried to change the first part (before the ->) to 1, 2, 3 etc. to match the mldose example file. However none of them worked..