Best reference set

Statistical Genomics for dummies and advanced. Discussions, links, usefull information.
Forum rules
Welcome! Please feel free to raise any issue. There is no issues big or small. Let's work on them together.

Please note that the first few posts of newly registered users will be moderated in order to filter out any spammers.
Nicola Pirastu
GenABEL senior expert
GenABEL senior expert
Posts: 151
Joined: Wed Feb 09, 2011 3:24 pm

Best reference set

Postby Nicola Pirastu » Tue Jul 05, 2011 8:48 pm


I'm not sure that this is the right place to write this post on but it seemed more appropriate than the GenABEL forum:

I'm trying to select samples which will be sequenced whole genome to create my own reference set instead of HapMap or 1000G.
Since I have some isolated population my idea is to choose the best set based on kinship so that the imputations can be more acurate and I loose as less power as possible for association.
I've found out that minimac hides each marker and then compares the imputed marker with the original one. There is an output with three different measures of accuracy, the first is the canonical Rsq as in MACH called looRSQ , the second is empR which on the wiki is defined as: the empirical correlation between true and imputed genotypes for the SNP. If this is negative, the SNP is probably flipped.
The third is empRSQ defined as : the actual R2 value, comparing imputed and true genotypes.
From what I understand I should look at the third one in order to asses acuracy, however I'm not sure and I could not find any litterature on it.

Am I interpreting this correctly or should I try somthing different?



GenABEL developer
GenABEL developer
Posts: 263
Joined: Fri Jan 21, 2011 5:20 pm

Re: Best reference set

Postby yurii » Wed Jun 06, 2012 10:26 am

From reading your post - note I have no experience with minimac - I would agree that the last measure is probably the best (but it should also correlate almost perfectly with the first). And also #3 should be the same as the #2 - squared. I wonder if this is correct :)
Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,

Return to “Journal Club on Statistical Genomics”

Who is online

Users browsing this forum: No registered users and 1 guest