[SOLVED] Formula for polygenic(), qtscore(), mlreg()

Questions about GenABEL (aka *ABEL) suite of packages
Forum rules
Please remember not to post any sensitive data on this public forum.
The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.
MariaG
GenABEL specialist
GenABEL specialist
Posts: 36
Joined: Thu Feb 03, 2011 12:29 pm

[SOLVED] Formula for polygenic(), qtscore(), mlreg()

Postby MariaG » Fri Feb 04, 2011 11:56 am

Hello :)

I'm trying a very easy task, just to perform GWAS with GenABEL for two models:

(1) ph ~ gt
(2) ph ~ gt + age

Using lm(), I would compose the formula like:

lm(ph ~ gt)
lm(ph ~ gt + age)

But what is a right way to write a formula in polygenic(), qtscore(), mlreg() ?

If I understood correctly, it's supposed that "gt" is automatically incorporated into the formula and I don't have to write it in the formula manually.

1) I'm confused, why

polygenic(ph),
qtscore(ph)

work, but

mlreg(ph)

gives an error: "Error in formula.default(object, env = baseenv()) : invalid formula"

2) I'm not sure, for the second model will it be correct:

polygenic(ph~age),
qtscore(ph~age),
mlreg(ph~age)
?


I would really appreciate, if you give me exact formulas for both of my models just to provide a clear examples.

Thanks a lot!

Maria
Last edited by MariaG on Mon Feb 28, 2011 3:14 pm, edited 1 time in total.

yurii
GenABEL developer
GenABEL developer
Posts: 263
Joined: Fri Jan 21, 2011 5:20 pm

Re: Formula for polygenic(), qtscore(), mlreg()

Postby yurii » Fri Feb 04, 2011 4:00 pm

The quick answer is that functions like qtscore are aware that if a model consist of single variable (like qtscore(pheno, ...)) this variable is an outcome and the model

pheno = mu + beta*SNP + e

should be fitted for each SNP in turn. Here, mu is an intercept, beta is the effect of SNP, SNP is a SNP genotype, and e is residual error.

Unlike qtscore, mlreg requires really correct (R-style) formula. That is, if there are no predictors, you need to say that you fit an intercept (+ SNPs, of cause). So, correct call to mlreg in case you only want SNP in you model is

mlreg(pheno ~ 1, ...)

then the model

pheno = mu + beta*SNP + e

will be applied.

Here is small example

Code: Select all

library(GenABEL)
data(ge03d2.clean)
# this works
a<-qtscore(bmi,data=ge03d2.clean)
# this works as well
a<-qtscore(bmi~age+sex,data=ge03d2.clean)
# this does NOT work
a<-mlreg(bmi,data=ge03d2.clean)
# this DOES work
a<-mlreg(bmi~1,data=ge03d2.clean)
# this works as well
a<-mlreg(bmi~age+sex,data=ge03d2.clean)


As you see, this is a bit confusing, but easy to solve if you know above (pheno ~ 1) trick. I have filled this in as a feature request at https://r-forge.r-project.org/tracker/i ... &atid=2061, hopefully it will get more consistent with time.

To keep the story complete, I should mention that "polygenic" is a different, as this is not a function to fit SNP effects genome-wide, but rather to estimate polygenic model. When you say polygenic(pheno ~ age + sex, ...) the model fitted is

pheno = mu + betaA*age + betaS*sex + G + e

where G is polygenic effect. When you say polygenic(pheno, ...), the model is

pheno = mu + G + e

Hope this helps!
Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,
Yurii

MariaG
GenABEL specialist
GenABEL specialist
Posts: 36
Joined: Thu Feb 03, 2011 12:29 pm

Re: Formula for polygenic(), qtscore(), mlreg()

Postby MariaG » Mon Feb 07, 2011 8:44 am

Thank you, Yurii, that's exactly what I need. And thanks for the so detailed model descriptions - important to understand!

About mlreg(pheno ~ 1, ...) trick I was thinking, because it's normal for linear models approach. However I was not sure, because in the script you use .C-function, and it's not clear what's going on inside the C-script, how do you construct a model ;)

(1) So, do I anderstand correctly, that mlreg() works more like lm(), but not like glm()?
And again I'm confused, why

SNP.num <- as.numeric(data@gtdata[, "SNP.top.1"]) # for the top hit SNP from mlreg(pheno ~ 1, ...)
summary(lm(pheno ~ SNP.num, data = data@phdata, family = "gaussian"))

lm() doesn't provide the same result as mlreg(pheno ~ 1, ...) ?
The "Estimate" from lm() is equal to the "effB" from mlreg(),
but p-value from lm() is something like e-07, while p-value from mlreg() is e-08...


(2) Some more questions about the mmscore().

As I can see from a paper mentioned in the references of the help(mmscore) menu, there are actually 2 approaches: T-score and T-LRT statistics.
Which approach is implemented in mmscore()?
If only T-score, is then T-LRT implemented anywhere in ABEL?
Would you recommend mmscore() only as an initial screening phase, which should be reevaluated with another statistic to avoid an excess of false-positive results in regions of strong linkage or you would say mmscore() is fine for GWAS?

Which model for association is used in mmscore(): pheno = mu + beta*genotype.score + e
or extended model for association: pheno = mu + beta*expected.genotype.score + e ,
where unobserved genotypes supposed to be estimated?

Thank you

yurii
GenABEL developer
GenABEL developer
Posts: 263
Joined: Fri Jan 21, 2011 5:20 pm

Re: Formula for polygenic(), qtscore(), mlreg()

Postby yurii » Mon Feb 07, 2011 4:54 pm

I now answer question

(1) So, do I anderstand correctly, that mlreg() works more like lm(), but not like glm()?


when trait="gaussian", mlreg works like lm, when trait="binomial" it works like glm(...,family=binomial), when trait="survival", it works like 'coxph' (see help(mlreg))

Here are two examples

Code: Select all

# mlreg to analyze quantitative data
mlrQ <- mlreg(bmi~1,data=ge03d2.clean,trait="gaussian")
summary(mlrQ)
bestSNP <- "rs70099"
# analyze best SNP using LM
lmQ <- summary(lm(bmi ~ as.numeric(ge03d2.clean[,bestSNP]),
                                 data=phdata(ge03d2.clean)))
# analyze best SNP using GLM-gaussian
glmQ <- summary(glm(bmi ~ as.numeric(ge03d2.clean[,bestSNP]),
                                 data=phdata(ge03d2.clean),
                                 fam="gaussian"))
# from all three methods betas and sebetas are exactly the same
mlrQ[bestSNP,,drop=F]
lmQ
glmQ
# hence Wald test statistics is the same as well
mlrQ[bestSNP,"chi2.1df",drop=F]
(lmQ$coef[2,"Estimate"]/lmQ$coef[2,"Std. Error"])^2
(glmQ$coef[2,"Estimate"]/glmQ$coef[2,"Std. Error"])^2

Now, you see that estimates and se's are the same, but p-values are somewhat different. This happens because t-test applied by lm and glm while Wald is applied by mlreg. These differences are usually small when number of subjects is large (say, > 200). Applying Wald (what mlreg does) is totally OK here (as well as applying t-test or score test or LRT would)

And here is an example for logistic regression:

Code: Select all

# make a fake binary trait
bt <- 1*(phdata(ge03d2.clean)$bmi>30)
# logistic regression with mlreg
mlrB <- mlreg(bt~1,data=ge03d2.clean,trait="binomial")
# logistic regression with glm-binomial
glmB <- summary(glm(bt ~ as.numeric(ge03d2.clean[,bestSNP]),
                                 data=phdata(ge03d2.clean),
                                 fam="binomial"))
# see that results are very much the same, also for p-values
# because Wald test is used by both mlreg and glm-binomial
mlrB[bestSNP,,drop=F]
glmB


Here results are the same, also for P-values because all functions use Wald in this case.

Hope this explains the story.
Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,
Yurii

yurii
GenABEL developer
GenABEL developer
Posts: 263
Joined: Fri Jan 21, 2011 5:20 pm

Re: Formula for polygenic(), qtscore(), mlreg()

Postby yurii » Mon Feb 07, 2011 5:07 pm

Now answer for

(2) Some more questions about the mmscore().


mmscore = T-score (but mind that we use genomic kinship and not pedigree kinship as a rule)

T-LRT is not implemented in GenABEL, but this is something we work together with William Astle to have in MixABEL

mmscore() is very good method, so I would not say "screening only" -- rather this is something which can be used right away

Model is described in details in Goncalo's paper (http://www.ncbi.nlm.nih.gov/pubmed/16685720) or see ProbABEL paper (http://www.biomedcentral.com/1471-2105/11/134) for extension of this approach; technical details are given in help(mmscore) and manual for ProbABEL.

In short, following my notation, the model is

phenotype* = mu + beta*SNP + G + e

with phenotype* being fixed-effects adjusted residuals from polygenic model, and parameters underlying distribution of random effects (G & e) not being estimated for each SNP, but rather taken from polygenic model

phenotype = mu + ...[extra covaruates]... + G + e

Hope this helps!
Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,
Yurii


Return to “GenABEL”

Who is online

Users browsing this forum: No registered users and 3 guests