How to eliminate duplicated genotype data?

Questions about GenABEL (aka *ABEL) suite of packages
Forum rules
Please remember not to post any sensitive data on this public forum.
The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.
lzhao
GenABEL suite user
GenABEL suite user
Posts: 10
Joined: Thu Jan 23, 2014 8:56 pm

How to eliminate duplicated genotype data?

Postby lzhao » Thu Jan 23, 2014 9:10 pm

Being a new user, I do find GenABEL very convenient for handling GWAS data. At a moment, I have one problem. My data set includes duplicated genotyping data, i.e., some sample have more than one genotype data set. It looks that load.gwaa.data reads in the data, without indicating this issue. However, the program gives error message when performing QC, which is expected. The question is if there is a way to modify load.gwaa.data, to indicate the duplication problem (just as a reminder of samples without any genotype data) and then to load only the first encounter. In practice, such duplications are often expected for variety reasons. So it would help me if there is a solution. Thanks.
Last edited by lzhao on Mon Mar 10, 2014 4:39 pm, edited 1 time in total.

lckarssen
Site Admin
Site Admin
Posts: 322
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: Duplicated Genotype Data

Postby lckarssen » Tue Jan 28, 2014 4:16 pm

Good point. GenABEL should give a warning when a sample is typed more than once. Could you file a bug report for this in our bugtracker? Thanks!

Instead of immediately doing QC, you could filter for duplicate IDs first. Something along these lines:
  • Make a logical vector containing FALSE when an ID occurs for the second time. For example using idnames(gtdata(df500)) to get the id names
  • Subset your gwaa.data.class object with that logical vector like this: uniqData <- originalData[uniqIDs, ]
Last edited by lckarssen on Tue Jan 28, 2014 4:17 pm, edited 1 time in total.
Reason: Fixed URL
-------
Lennart Karssen
PolyOmica
The Netherlands
-------


Return to “GenABEL”

Who is online

Users browsing this forum: No registered users and 1 guest