Page 1 of 1

Re: mach2databel - Error in dimnames..

Posted: Thu Feb 09, 2017 8:42 pm
by lckarssen
Thanks for joining the forum. I will try and see if I can reproduce this. I happen to have to convert some imputed data in VCF format to DatABEL files myself. I will get back to you in a couple of days.

Re: mach2databel - Error in dimnames..

Posted: Mon Feb 13, 2017 10:29 am
by sigridbo
Thank you very much!

Another think I could mention is that the .dose-file was on the format:

Code: Select all

9455456984_RO6C01->9455456984_RO6C01 MLDOSE 0.745 0.001 0.864   ...etc.
9455458937_RO1CO2->9455458937_RO1CO2 MLDOSE 0.869 0.001 0.866 ...etc.
9456874354_RO5CO1->9456874354_RO5CO1 MLDOSE 0.912 0.001 0.853 ...etc.
etc.


after conversion with DosageConvertor, but since I got the this dimname-error, I tried to change the first part (before the ->) to 1, 2, 3 etc. to match the mldose example file. However none of them worked..

Re: mach2databel - Error in dimnames..

Posted: Mon Apr 03, 2017 10:48 am
by sigridbo
I think we found the solution.

The vcf-files contain several variants of the same marker, e.g. biallelic SNPs, indels or deletion, separated on several lines.
This means that one marker (on position) may appear several times in the vcf-file.
When removing the "duplicates", the program seems to work fine, at least on the short testfiles.

However, I still cannot run it on one big vcf-file, but now another error appears:

Code: Select all

Loading required package: MASS
Loading required package: GenABEL.data
DatABEL v.0.9-6 loaded

Read 9179937 items
Options in effect:
    --infile    = chr10.mach.dose
    --outfile   = chr10_out
    --skiprows  = OFF
    --skipcols  = 2
    --cnrow     = ON, using data from file './tmp562597'
    --rncol     = ON, using column 1 of 'chr10.mach.dose'
    --transpose = OFF
    --Rmatrix   = OFF
    --nanString = NA
Reading columns from ./tmp562597: Number of names in column-names file './tmp562597' is 1311420

Number of lines in source file is 6551
Number of words in source file is 706151
skiprows = 0
cnrow = 0
skipcols = 2
rncol = 1
Rmatrix = 0
numWords = 706151
Creating file with numRows = 6551
Creating file with numColumns = 706149
Overflow of FixedChar (length of name > NAMELENGTH (32): GTGCCCCTCTCCCACCTTCACCACCACAGCCCCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTAACCAACCCCACACCTCCAGGAGCAGCAACTCAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAACTAATACGTGTTGTGTGTTGAGTAATAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGATCAGACACGGAACCACCAGCAGCTTCATCTTGGACTTCCAGGTGAGAAGGTGGCAAACTCAGGTGCAGGCCTG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGCACTGTCCTCCACCCTGGGCCAAAGCACTGTCCTCCACCCTGGGCCAAAGCATT.
Overflow of FixedChar (length of name > NAMELENGTH (32): CCAGGAGGAGAAAGAAAATCCATGCTGCTCGCAAGAAGCAGGGCCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAAAGAAAAAAGAAAAAGAAAAAAGAACTAAAGGAAGG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AACATTGAATTGGAATATTTCGGATTCCCATTTTGCTTCTGAGCTTAT.
Overflow of FixedChar (length of name > NAMELENGTH (32): TAAAAAAGGCGGAGCCTGCAGTGAGCCGAGATTGCGCCACTGCACTCCAGCCTGGGCGACAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAA.
Overflow of FixedChar (length of name > NAMELENGTH (32): TGAAAAAATTGATCTGTGCATGGTTATGTCTTTTA.
Overflow of FixedChar (length of name > NAMELENGTH (32): GCTATTAGGTGAATCTGTAAGCTTTGCTGAGAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AATGAGGTGGAAATATACATGTTGATGAGGTGG.
Overflow of FixedChar (length of name > NAMELENGTH (32): GCTTTAGAGGTGACAGCACTCATGGAGCTGTTATCTCCCATGATAGCAAGGCCTT.
Overflow of FixedChar (length of name > NAMELENGTH (32): ATTATCAAATAGTATCAAACTTATACATTGAAACTCTAGTCCCCAGTATCTCTGAATGTGACTATAT.
Overflow of FixedChar (length of name > NAMELENGTH (32): CCTTCTATTCATAGGGATAGAATAGAAGTTCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGATATATATACACTTTCCTGATACACATATCTGAATTCTT.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTCTATATCTCTATAGATATCTCTATAGATATC.
Overflow of FixedChar (length of name > NAMELENGTH (32): TAAAAATACAAATACTTCTCTCTACTAAAAATAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTGAAAAAACGAAGAAAACACTTCATTGGTTTGCTGTAGGACTTAAGAA.
Overflow of FixedChar (length of name > NAMELENGTH (32): TCTGCCAGAAAGACCTTTATGTGAAAGGTGTTGAGCAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): ACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTCCTTAGAGAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAGATCAAAGAAGAAATCACAAGGGAAATTAGAAAATACTTAAGAG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG.
Trying to set name of vars out of range (706149)


ERROR in Rstuff:failed in text2fvf_R
Index file not exists: chr10_out.fvi
Error in (function (cl, name, valueClass)  :
  assignment of an object of class “NULL” is not valid for @‘data’ in an object of class “databel”; is(value, "externalptr") is not TRUE
Calls: mach2databel ... new -> initialize -> initialize -> .local -> <Anonymous>
Execution halted



Looks like it has a problem with large indels. Anyone had the same problem?

Re: [SOLVED] mach2databel - Error in dimnames..

Posted: Thu Apr 06, 2017 3:34 pm
by sigridbo
I managed to find the solution.
Which was not the id's in the .dose-files but the markers (positions) in the info-files.

After imputation, SNPs with several alleles, or with additional indels or deletions, are separated into different rows in the .vcf-files, meaning that each "SNP" in the .mach.info-file might appear several times.
That caused the problem, and was why mach2databel reported duplicates.

Removing the "duplicates", leaving only the first occurrence, solved the problem.