Questions about GenABEL (aka *ABEL) suite of packages
Postby Samantha » Fri Mar 27, 2015 12:48 pm

After formatting data in PLINK to tped and tfam format I read the data into Genabel as so:
convert.snp.tped(tpedfile="gwasGenabelIbsQced.tped",tfamfile="gwasGenabelIbsQced.tfam", outfile="gwasGenabelIbsQced.out",strand="u",bcast=10000)

After recoding for sex chromosomes I calculate genotype counts and allele frequencies using function.

However, when I have a look at the output I get some curious results. My understanding is that Q.2 is the allele frequency of the B allele.
When I look at A1 and A2 in the outputs alleles for some SNPs seem to have been switched from what they are in the raw data, for example:
Chromosome Position Strand A1 A2 NoMeasured CallRate Q.2
ARS-BFGL-NGS-100232 0 0 u G A 10 1 0.20
ARS-BFGL-NGS-100372 0 0 u G A 10 1 0.35
ARS-BFGL-NGS-100549 0 0 u G A 10 1 0.20
ARS-BFGL-NGS-100941 0 0 u A G 10 1 0.05
ARS-BFGL-NGS-101104 0 0 u A G 10 1 0.30

In the raw illumina file data SNP ARS-BFGL-BAC-13205 the polymorphism is coded as AG but in GenAbel it seems to come out as GA ... thus the allele A and allele B seems to have been switched? Is this correct? For my dataset Q.2 ranges from 0.0 to 0.5.The only thing that I can think of is that GenAbel seems to be determining what is the minor and major allele and may be switching once identifed so that the minor allele frequency is reported as Q.2.
Another example is that when I subset data by phenotypes and plot the allele frequencies I get a truncated scatterplot with the spread of allele frequencies halting at ~0.5 for both the x and y axis on the plot.

Am I reading in the data wrong? Missing a step? Is there a way to force GenAbel to stick to the alleles/polymorphism as present in the raw data?



