[SOLVED] mach2databel - Error in dimnames..

Questions about GenABEL (aka *ABEL) suite of packages
Forum rules
Please remember not to post any sensitive data on this public forum.
The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.
lckarssen
Site Admin
Site Admin
Posts: 322
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: mach2databel - Error in dimnames..

Postby lckarssen » Thu Feb 09, 2017 8:42 pm

Thanks for joining the forum. I will try and see if I can reproduce this. I happen to have to convert some imputed data in VCF format to DatABEL files myself. I will get back to you in a couple of days.
-------
Lennart Karssen
PolyOmica
The Netherlands
-------

sigridbo
Posts: 3
Joined: Wed Feb 08, 2017 12:45 pm

Re: mach2databel - Error in dimnames..

Postby sigridbo » Mon Feb 13, 2017 10:29 am

Thank you very much!

Another think I could mention is that the .dose-file was on the format:

Code: Select all

9455456984_RO6C01->9455456984_RO6C01 MLDOSE 0.745 0.001 0.864   ...etc.
9455458937_RO1CO2->9455458937_RO1CO2 MLDOSE 0.869 0.001 0.866 ...etc.
9456874354_RO5CO1->9456874354_RO5CO1 MLDOSE 0.912 0.001 0.853 ...etc.
etc.


after conversion with DosageConvertor, but since I got the this dimname-error, I tried to change the first part (before the ->) to 1, 2, 3 etc. to match the mldose example file. However none of them worked..

sigridbo
Posts: 3
Joined: Wed Feb 08, 2017 12:45 pm

Re: mach2databel - Error in dimnames..

Postby sigridbo » Mon Apr 03, 2017 10:48 am

I think we found the solution.

The vcf-files contain several variants of the same marker, e.g. biallelic SNPs, indels or deletion, separated on several lines.
This means that one marker (on position) may appear several times in the vcf-file.
When removing the "duplicates", the program seems to work fine, at least on the short testfiles.

However, I still cannot run it on one big vcf-file, but now another error appears:

Code: Select all

Loading required package: MASS
Loading required package: GenABEL.data
DatABEL v.0.9-6 loaded

Read 9179937 items
Options in effect:
    --infile    = chr10.mach.dose
    --outfile   = chr10_out
    --skiprows  = OFF
    --skipcols  = 2
    --cnrow     = ON, using data from file './tmp562597'
    --rncol     = ON, using column 1 of 'chr10.mach.dose'
    --transpose = OFF
    --Rmatrix   = OFF
    --nanString = NA
Reading columns from ./tmp562597: Number of names in column-names file './tmp562597' is 1311420

Number of lines in source file is 6551
Number of words in source file is 706151
skiprows = 0
cnrow = 0
skipcols = 2
rncol = 1
Rmatrix = 0
numWords = 706151
Creating file with numRows = 6551
Creating file with numColumns = 706149
Overflow of FixedChar (length of name > NAMELENGTH (32): GTGCCCCTCTCCCACCTTCACCACCACAGCCCCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTAACCAACCCCACACCTCCAGGAGCAGCAACTCAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAACTAATACGTGTTGTGTGTTGAGTAATAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGATCAGACACGGAACCACCAGCAGCTTCATCTTGGACTTCCAGGTGAGAAGGTGGCAAACTCAGGTGCAGGCCTG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGCACTGTCCTCCACCCTGGGCCAAAGCACTGTCCTCCACCCTGGGCCAAAGCATT.
Overflow of FixedChar (length of name > NAMELENGTH (32): CCAGGAGGAGAAAGAAAATCCATGCTGCTCGCAAGAAGCAGGGCCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAAAGAAAAAAGAAAAAGAAAAAAGAACTAAAGGAAGG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AACATTGAATTGGAATATTTCGGATTCCCATTTTGCTTCTGAGCTTAT.
Overflow of FixedChar (length of name > NAMELENGTH (32): TAAAAAAGGCGGAGCCTGCAGTGAGCCGAGATTGCGCCACTGCACTCCAGCCTGGGCGACAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAA.
Overflow of FixedChar (length of name > NAMELENGTH (32): TGAAAAAATTGATCTGTGCATGGTTATGTCTTTTA.
Overflow of FixedChar (length of name > NAMELENGTH (32): GCTATTAGGTGAATCTGTAAGCTTTGCTGAGAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AATGAGGTGGAAATATACATGTTGATGAGGTGG.
Overflow of FixedChar (length of name > NAMELENGTH (32): GCTTTAGAGGTGACAGCACTCATGGAGCTGTTATCTCCCATGATAGCAAGGCCTT.
Overflow of FixedChar (length of name > NAMELENGTH (32): ATTATCAAATAGTATCAAACTTATACATTGAAACTCTAGTCCCCAGTATCTCTGAATGTGACTATAT.
Overflow of FixedChar (length of name > NAMELENGTH (32): CCTTCTATTCATAGGGATAGAATAGAAGTTCA.
Overflow of FixedChar (length of name > NAMELENGTH (32): AGATATATATACACTTTCCTGATACACATATCTGAATTCTT.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTCTATATCTCTATAGATATCTCTATAGATATC.
Overflow of FixedChar (length of name > NAMELENGTH (32): TAAAAATACAAATACTTCTCTCTACTAAAAATAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): GTGAAAAAACGAAGAAAACACTTCATTGGTTTGCTGTAGGACTTAAGAA.
Overflow of FixedChar (length of name > NAMELENGTH (32): TCTGCCAGAAAGACCTTTATGTGAAAGGTGTTGAGCAGC.
Overflow of FixedChar (length of name > NAMELENGTH (32): ACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTCCTTAGAGAC.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAAGATCAAAGAAGAAATCACAAGGGAAATTAGAAAATACTTAAGAG.
Overflow of FixedChar (length of name > NAMELENGTH (32): AAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG.
Trying to set name of vars out of range (706149)


ERROR in Rstuff:failed in text2fvf_R
Index file not exists: chr10_out.fvi
Error in (function (cl, name, valueClass)  :
  assignment of an object of class “NULL” is not valid for @‘data’ in an object of class “databel”; is(value, "externalptr") is not TRUE
Calls: mach2databel ... new -> initialize -> initialize -> .local -> <Anonymous>
Execution halted



Looks like it has a problem with large indels. Anyone had the same problem?

sigridbo
Posts: 3
Joined: Wed Feb 08, 2017 12:45 pm

Re: [SOLVED] mach2databel - Error in dimnames..

Postby sigridbo » Thu Apr 06, 2017 3:34 pm

I managed to find the solution.
Which was not the id's in the .dose-files but the markers (positions) in the info-files.

After imputation, SNPs with several alleles, or with additional indels or deletions, are separated into different rows in the .vcf-files, meaning that each "SNP" in the .mach.info-file might appear several times.
That caused the problem, and was why mach2databel reported duplicates.

Removing the "duplicates", leaving only the first occurrence, solved the problem.


Return to “GenABEL”

Who is online

Users browsing this forum: No registered users and 1 guest