How to export specific SNPs or regions

Questions about GenABEL (aka *ABEL) suite of packages
Forum rules
Please remember not to post any sensitive data on this public forum.
The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.
canmedgen
Posts: 1
Joined: Sat Jan 11, 2014 10:34 am

How to export specific SNPs or regions

Postby canmedgen » Sat Jan 11, 2014 12:24 pm

Hi
The GenAbel manual tells that you get out a specifc SNP given by its number ("as.character(gtdata(data)[1:4, 1177])")
It also works nicely for one SNP using SNPname ("as.character(gtdata(data)[ ,"SNPxxxx"])- and makes a list on the screen
How can I get out all SNPs in a chromosomal region for all individual in a FILE by giving chromosome number and positions?
Also how can I get out a list of several specific SNPs in a file?

lckarssen
Site Admin
Site Admin
Posts: 322
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: How to export specific SNPs or regions

Postby lckarssen » Tue Jan 14, 2014 2:36 pm

canmedgen wrote:How can I get out all SNPs in a chromosomal region for all individual in a FILE by giving chromosome number and positions?

The chromosome information in a gtdata object can be found using the chromosome function. The positions can be found using map()

Code: Select all

> data(ge03d2)    # Load the GenABEL example data

> # See which chromosomes are present in the data and woith how many SNPs:
> table(chromosome(gtdata(ge03d2)))

   1    2    3    X
3555 1999 1466  569

> # See the positions of the first 10 SNPs
> head(map(gtdata(ge03d2)))
rs1646456 rs7950586 rs4785242 rs4435802 rs2847446 rs9308393
      653       849      1766      5291      5555      6739

> # This gets all SNPs on chromosome 2 for IDs 1 through 4:
> chr2data <- as.character(gtdata(ge03d2)[1:4, which(chromosome(gtdata(ge03d2)[1:4,]) == "2")])

So if you define a list of positions and chromosomes (e.g. by reading from a file), you can extract the data like this:

Code: Select all

> poslist <- c(13661657, 4555858, 1766)    # List of positions
> # Names of SNPs in our position list (only those positions from the list that are actually present)
> snpnms <- snpnames(gtdata(ge03d2)[, which(map(gtdata(ge03d2)) %in% poslist)]
> # Show which chromosomes the selected positions are on
> chromosome(gtdata(ge03d2)[, which(snpnames(gtdata(ge03d2)) %in% snpnms)])
rs4785242  rs364524 rs3948926
      "1"       "1"       "X"
> # So our selected positions are on chromosomes 1 and X.

> chromlist <- c("1")   # Taking only chromosome one for simplicity
> idlist <- 1:4  # taking the first 4 individuals


Combining everything we get:

Code: Select all

> selection <- as.character(gtdata(ge03d2)[idlist, which(chromosome(gtdata(ge03d2)) %in% chromlist & map(gtdata(ge03d2)) %in% poslist)])
> selection
     rs4785242 rs364524
id4  "T/C"     "C/C"
id10 "T/C"     "T/C"
id25 "C/C"     "C/C"
id33 "T/C"     "C/C"

So, as expected we have data on two SNPs on chromosome 1 for the four individuals.

All in all not very simple. Maybe someone else has a more elegant solution?


Also how can I get out a list of several specific SNPs in a file?

If you have the SNP names in a file (e.g. called snpfile.txt you can read that file into a data frame and simply refer to the column (V1 if the file doesn't have a header line:

Code: Select all

> snpsFromFile <- read.table("snpfile")
> snpsFromFile
         V1
1 rs4785242
2 rs4435802
> as.character(gtdata(ge03d2))[1:4,snpsFromFile$V1]
     rs7950586 rs1646456
id4  "T/A"     "C/G"
id10 "T/T"     "C/C"
id25 "T/T"     "C/G"
id33 "T/T"     "C/C"
-------
Lennart Karssen
PolyOmica
The Netherlands
-------


Return to “GenABEL”

Who is online

Users browsing this forum: No registered users and 2 guests