Page 1 of 1

Notes on GWAS Quality Control/Quality Assurance

Posted: Sun Apr 17, 2011 9:59 pm
by MariaG
While doing the Association analysis, e.g. by use of GenaBEL, I was wandered how do people choose the appropriate methods and the parameters’ values for the Quality Control, and how to check the Quality assurance afterwards?

Here I provide some links which were useful for me.
I’m pretty sure you might add even better links! Please, do that :P

Extensive overview of GWA in general, and the analysis’ quality in particular:
Genome-wide association studies for complex traits: consensus, uncertainty and challenges”, NcCarthy, Abecasis, Cardon, Goldstein, Little, Loannidis and Hirschhorn. Nature, 2008
http://www.nature.com/nrg/journal/v9/n5/abs/nrg2344.html

What I personally liked in the paper:

(a) They stress that there is a “propensity for latent population structure (population stratification and cryptic relatedness) to inflate the type 1 error rate and generate spurious claims of association around variants that are informative for that substructure “
To check if my data have a population stratification and how get read of their effect I can follow the instructions in “GenABEL Tutorial”.

(b) The paper supposes to check for sample duplicates
To do that I can check the duplicates in the subjects names, and also to use the high threshold for the ibs (identical-by-state) parameter (e.g. in GenABEL ibs.threshold = 0.95)

(c) The paper shows how to estimate the quality of the single SNP’s genotyping by use of the signal intensity (cluster) plots.
The other link shows that knowledge of the goodness of clustering can also help to estimate the correct threshold for the Hardy-Weinberg Equilibrium filtering. The p-values for HWE should be in such frames, that most of the corresponding clustering would represent the good quality:
“Quality Control and Quality Assurance in Genotypic Data for gGenome-Wide Association Studies”, Laurie et al. Genetic Epidemiology, 2010
http://www.ncbi.nlm.nih.gov/pubmed/20718045

(d) The paper describes the use of Q-Q plot for the diagnostics of the GWA results. So, based on the features of the QQ plot I can choose the appropriate approach (QC parameters and the mathematical model) for GWA. The important detail is: If you see the inflation of the observed findings within the area which is close to the (0,0) , then it most probably indicates the presence of population stratification or cryptic relatedness. If the inflation is observed in the region toward the higher (x,y) values, then it can suggest of an excess of strong associations.

And another helpful method (pls, see the link below) to check if you have inflation because of the population stratification or because of the true association: To exclude the top-SNPs with the lowest p-values, and to construct Q-Q plot again: If I see no inflation at higher (x,y) then most probably the deviation was caused by the association:
“How to Interpret a Genome-wide Association Study.” Pearson, Manolio. JAMA, 2008
http://jama.ama-assn.org/content/299/11/1335.full?maxtoshow=&hits=10&RESULTFORMAT=&fulltext=Inflammatory+Bowel+Disease&searchid=1&FIRSTINDEX=0&resourcetype=HWFIG

The other two links below show that Q-Q plot can also be used for estimation of the HWE-filtering quality. Deviations from the y = x line correspond to loci that deviate from the null hypothesis. So I can check, which HWE p-value threshold gives the lowest deviation, and use it as a QC parameter.
“A tutorial on statistical methods for population association studies”, Balding. Nature reviews/Genetics, 2006
http://www.nature.com/nrg/journal/v7/n10/full/nrg1916.html
“Notes on GWAS QC/QA”, Weale. 2009
[url]http://www.docstoc.com/docs/41239840/MEWGWASQCNotes20090205
[/url]


What I didn’t find in the paper: The information about the “heterozygosity” filtering. From the link, mentioned above (Weale, 2009), samples with unusually high heterozygosity suggest possible sample contamination. In GenABEL’s default QC filtering the threshold is equal to het.fdr = 0.01. I’m not sure that I’ve really got an idea behind that. Do you know a good link to clarify the biological and statistical basis for heterozygosity filtering??

Re: Notes on GWAS Quality Control/Quality Assurance

Posted: Mon Apr 18, 2011 8:08 am
by yurii
Adding my 2 cents, here is the link on sex verification procedures

viewtopic.php?f=6&t=446

While it is GenABEL-centric, the post lists few general principles.

Re: Notes on GWAS Quality Control/Quality Assurance

Posted: Mon Apr 18, 2011 8:22 am
by yurii
What I didn’t find in the paper: The information about the “heterozygosity” filtering. From the link, mentioned above (Weale, 2009), samples with unusually high heterozygosity suggest possible sample contamination. In GenABEL’s default QC filtering the threshold is equal to het.fdr = 0.01. I’m not sure that I’ve really got an idea behind that. Do you know a good link to clarify the biological and statistical basis for heterozygosity filtering??


I think the idea is more or less like this: if you got DNA from single person, there is a certain chance you will see a homozygote of certain type, or a heterozygote. But if you add more and more DNAs, the chances that you will see BOTH alleles increases: e.g. if MAF is 0.1, that chance to see both alleles is

P(AB)=2*.1*.9=0.18,

but if you look in DNA of 2 people, you can see A and B if (person1 is AB and person 2 is not AB) or (person 2 is AB and person 1 is not AB) or (both are AB) or (person 1 is AA and person 2 is BB) or (person 1 is BB and person 1 is AA), so

P(see A and B in 2 people) = 0.18*0.18+2*.18*(1-.18)+2*0.1^2*0.9^2 = 0.34 > 0.18

As for statistics, any 'outlier detection' procedure can be used: e.g. 5 SD, or as in case of GenABEL-package::check.marker, ones with FDR<0.01 are considered outliers.

It is also quite frequent to see that samples which do not pass heterozygosity criterion also have low call rate. It would actually be interesting to look into intensity plots for samples people not passing heterozygosity thresholds and try to guess how many DNAs / in what proportion are mixed together. I do not quite see practical side of this, but may be fun.

Yurii

Re: Notes on GWAS Quality Control/Quality Assurance

Posted: Sun May 08, 2011 8:12 pm
by MariaG
Well, the further QC question, for example, is:

How important is to check the "normality" of the phenotype? I didn't see any GWAS paper where people would mention the quality of the phenotype.. Even if they use "sqtr" or "log"-transformation they usually don't show the p-values for normality .. What would you say about that? I would appreciate if you can give me a usuful links :)

Thanks

Re: Notes on GWAS Quality Control/Quality Assurance

Posted: Thu May 26, 2011 3:00 am
by Nicola Pirastu
Well this is not a reference in the strict sense but I think it clarifies some points.

http://www.duke.edu/~rnau/testing.htm

So as you can read normality of the trait is not strictly required, the important thing is that errors are normally distributed expecially for estimating pvalues.

As far as I know no one reports test of normality for traits for two main reason:

1. It is not required for the reasons mentioned above

2. If you run shapiro test on real data it will always be significant (i'd love to see someone that could actually find a really normally dictributed trait).



Nicola

Re: Notes on GWAS Quality Control/Quality Assurance

Posted: Thu Apr 12, 2012 9:26 am
by Jodie81
Thanks a lot for sharing, Maria and yurii!
I'm writing a talk on this topic, so your thoughts upon this topic means much.
That's very useful for me!


_______________
pdf viewer