Hello

I'm trying a very easy task, just to perform GWAS with GenABEL for two models:

(1) ph ~ gt

(2) ph ~ gt + age

Using lm(), I would compose the formula like:

lm(ph ~ gt)

lm(ph ~ gt + age)

But what is a right way to write a formula in polygenic(), qtscore(), mlreg() ?

If I understood correctly, it's supposed that "gt" is automatically incorporated into the formula and I don't have to write it in the formula manually.

1) I'm confused, why

polygenic(ph),

qtscore(ph)

work, but

mlreg(ph)

gives an error: "Error in formula.default(object, env = baseenv()) : invalid formula"

2) I'm not sure, for the second model will it be correct:

polygenic(ph~age),

qtscore(ph~age),

mlreg(ph~age)

?

I would really appreciate, if you give me exact formulas for both of my models just to provide a clear examples.

Thanks a lot!

Maria

## [SOLVED] Formula for polygenic(), qtscore(), mlreg()

**Forum rules**

Please remember not to post any sensitive data on this public forum.

The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.

### [SOLVED] Formula for polygenic(), qtscore(), mlreg()

Last edited by MariaG on Mon Feb 28, 2011 3:14 pm, edited 1 time in total.

### Re: Formula for polygenic(), qtscore(), mlreg()

The quick answer is that functions like qtscore are aware that if a model consist of single variable (like qtscore(pheno, ...)) this variable is an outcome and the model

pheno = mu + beta*SNP + e

should be fitted for each SNP in turn. Here, mu is an intercept, beta is the effect of SNP, SNP is a SNP genotype, and e is residual error.

Unlike qtscore, mlreg requires really correct (R-style) formula. That is, if there are no predictors, you need to say that you fit an intercept (+ SNPs, of cause). So, correct call to mlreg in case you only want SNP in you model is

mlreg(pheno ~ 1, ...)

then the model

pheno = mu + beta*SNP + e

will be applied.

Here is small example

As you see, this is a bit confusing, but easy to solve if you know above (pheno ~ 1) trick. I have filled this in as a feature request at https://r-forge.r-project.org/tracker/i ... &atid=2061, hopefully it will get more consistent with time.

To keep the story complete, I should mention that "polygenic" is a different, as this is not a function to fit SNP effects genome-wide, but rather to estimate polygenic model. When you say polygenic(pheno ~ age + sex, ...) the model fitted is

pheno = mu + betaA*age + betaS*sex + G + e

where G is polygenic effect. When you say polygenic(pheno, ...), the model is

pheno = mu + G + e

Hope this helps!

pheno = mu + beta*SNP + e

should be fitted for each SNP in turn. Here, mu is an intercept, beta is the effect of SNP, SNP is a SNP genotype, and e is residual error.

Unlike qtscore, mlreg requires really correct (R-style) formula. That is, if there are no predictors, you need to say that you fit an intercept (+ SNPs, of cause). So, correct call to mlreg in case you only want SNP in you model is

mlreg(pheno ~ 1, ...)

then the model

pheno = mu + beta*SNP + e

will be applied.

Here is small example

Code: Select all

`library(GenABEL)`

data(ge03d2.clean)

# this works

a<-qtscore(bmi,data=ge03d2.clean)

# this works as well

a<-qtscore(bmi~age+sex,data=ge03d2.clean)

# this does NOT work

a<-mlreg(bmi,data=ge03d2.clean)

# this DOES work

a<-mlreg(bmi~1,data=ge03d2.clean)

# this works as well

a<-mlreg(bmi~age+sex,data=ge03d2.clean)

As you see, this is a bit confusing, but easy to solve if you know above (pheno ~ 1) trick. I have filled this in as a feature request at https://r-forge.r-project.org/tracker/i ... &atid=2061, hopefully it will get more consistent with time.

To keep the story complete, I should mention that "polygenic" is a different, as this is not a function to fit SNP effects genome-wide, but rather to estimate polygenic model. When you say polygenic(pheno ~ age + sex, ...) the model fitted is

pheno = mu + betaA*age + betaS*sex + G + e

where G is polygenic effect. When you say polygenic(pheno, ...), the model is

pheno = mu + G + e

Hope this helps!

Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,

Yurii

best regards,

Yurii

### Re: Formula for polygenic(), qtscore(), mlreg()

Thank you, Yurii, that's exactly what I need. And thanks for the so detailed model descriptions - important to understand!

About mlreg(pheno ~ 1, ...) trick I was thinking, because it's normal for linear models approach. However I was not sure, because in the script you use .C-function, and it's not clear what's going on inside the C-script, how do you construct a model

(1) So, do I anderstand correctly, that mlreg() works more like lm(), but not like glm()?

And again I'm confused, why

SNP.num <- as.numeric(data@gtdata[, "SNP.top.1"]) # for the top hit SNP from mlreg(pheno ~ 1, ...)

summary(lm(pheno ~ SNP.num, data = data@phdata, family = "gaussian"))

lm() doesn't provide the same result as mlreg(pheno ~ 1, ...) ?

The "Estimate" from lm() is equal to the "effB" from mlreg(),

but p-value from lm() is something like e-07, while p-value from mlreg() is e-08...

(2) Some more questions about the mmscore().

As I can see from a paper mentioned in the references of the help(mmscore) menu, there are actually 2 approaches: T-score and T-LRT statistics.

Which approach is implemented in mmscore()?

If only T-score, is then T-LRT implemented anywhere in ABEL?

Would you recommend mmscore() only as an initial screening phase, which should be reevaluated with another statistic to avoid an excess of false-positive results in regions of strong linkage or you would say mmscore() is fine for GWAS?

Which model for association is used in mmscore(): pheno = mu + beta*genotype.score + e

or extended model for association: pheno = mu + beta*expected.genotype.score + e ,

where unobserved genotypes supposed to be estimated?

Thank you

About mlreg(pheno ~ 1, ...) trick I was thinking, because it's normal for linear models approach. However I was not sure, because in the script you use .C-function, and it's not clear what's going on inside the C-script, how do you construct a model

(1) So, do I anderstand correctly, that mlreg() works more like lm(), but not like glm()?

And again I'm confused, why

SNP.num <- as.numeric(data@gtdata[, "SNP.top.1"]) # for the top hit SNP from mlreg(pheno ~ 1, ...)

summary(lm(pheno ~ SNP.num, data = data@phdata, family = "gaussian"))

lm() doesn't provide the same result as mlreg(pheno ~ 1, ...) ?

The "Estimate" from lm() is equal to the "effB" from mlreg(),

but p-value from lm() is something like e-07, while p-value from mlreg() is e-08...

(2) Some more questions about the mmscore().

As I can see from a paper mentioned in the references of the help(mmscore) menu, there are actually 2 approaches: T-score and T-LRT statistics.

Which approach is implemented in mmscore()?

If only T-score, is then T-LRT implemented anywhere in ABEL?

Would you recommend mmscore() only as an initial screening phase, which should be reevaluated with another statistic to avoid an excess of false-positive results in regions of strong linkage or you would say mmscore() is fine for GWAS?

Which model for association is used in mmscore(): pheno = mu + beta*genotype.score + e

or extended model for association: pheno = mu + beta*expected.genotype.score + e ,

where unobserved genotypes supposed to be estimated?

Thank you

### Re: Formula for polygenic(), qtscore(), mlreg()

I now answer question

when trait="gaussian", mlreg works like lm, when trait="binomial" it works like glm(...,family=binomial), when trait="survival", it works like 'coxph' (see help(mlreg))

Here are two examples

Now, you see that estimates and se's are the same, but p-values are somewhat different. This happens because t-test applied by lm and glm while Wald is applied by mlreg. These differences are usually small when number of subjects is large (say, > 200). Applying Wald (what mlreg does) is totally OK here (as well as applying t-test or score test or LRT would)

And here is an example for logistic regression:

Here results are the same, also for P-values because all functions use Wald in this case.

Hope this explains the story.

(1) So, do I anderstand correctly, that mlreg() works more like lm(), but not like glm()?

when trait="gaussian", mlreg works like lm, when trait="binomial" it works like glm(...,family=binomial), when trait="survival", it works like 'coxph' (see help(mlreg))

Here are two examples

Code: Select all

`# mlreg to analyze quantitative data`

mlrQ <- mlreg(bmi~1,data=ge03d2.clean,trait="gaussian")

summary(mlrQ)

bestSNP <- "rs70099"

# analyze best SNP using LM

lmQ <- summary(lm(bmi ~ as.numeric(ge03d2.clean[,bestSNP]),

data=phdata(ge03d2.clean)))

# analyze best SNP using GLM-gaussian

glmQ <- summary(glm(bmi ~ as.numeric(ge03d2.clean[,bestSNP]),

data=phdata(ge03d2.clean),

fam="gaussian"))

# from all three methods betas and sebetas are exactly the same

mlrQ[bestSNP,,drop=F]

lmQ

glmQ

# hence Wald test statistics is the same as well

mlrQ[bestSNP,"chi2.1df",drop=F]

(lmQ$coef[2,"Estimate"]/lmQ$coef[2,"Std. Error"])^2

(glmQ$coef[2,"Estimate"]/glmQ$coef[2,"Std. Error"])^2

Now, you see that estimates and se's are the same, but p-values are somewhat different. This happens because t-test applied by lm and glm while Wald is applied by mlreg. These differences are usually small when number of subjects is large (say, > 200). Applying Wald (what mlreg does) is totally OK here (as well as applying t-test or score test or LRT would)

And here is an example for logistic regression:

Code: Select all

`# make a fake binary trait`

bt <- 1*(phdata(ge03d2.clean)$bmi>30)

# logistic regression with mlreg

mlrB <- mlreg(bt~1,data=ge03d2.clean,trait="binomial")

# logistic regression with glm-binomial

glmB <- summary(glm(bt ~ as.numeric(ge03d2.clean[,bestSNP]),

data=phdata(ge03d2.clean),

fam="binomial"))

# see that results are very much the same, also for p-values

# because Wald test is used by both mlreg and glm-binomial

mlrB[bestSNP,,drop=F]

glmB

Here results are the same, also for P-values because all functions use Wald in this case.

Hope this explains the story.

Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,

Yurii

best regards,

Yurii

### Re: Formula for polygenic(), qtscore(), mlreg()

Now answer for

mmscore = T-score (but mind that we use genomic kinship and not pedigree kinship as a rule)

T-LRT is not implemented in GenABEL, but this is something we work together with William Astle to have in MixABEL

mmscore() is very good method, so I would not say "screening only" -- rather this is something which can be used right away

Model is described in details in Goncalo's paper (http://www.ncbi.nlm.nih.gov/pubmed/16685720) or see ProbABEL paper (http://www.biomedcentral.com/1471-2105/11/134) for extension of this approach; technical details are given in help(mmscore) and manual for ProbABEL.

In short, following my notation, the model is

phenotype* = mu + beta*SNP + G + e

with phenotype* being fixed-effects adjusted residuals from polygenic model, and parameters underlying distribution of random effects (G & e) not being estimated for each SNP, but rather taken from polygenic model

phenotype = mu + ...[extra covaruates]... + G + e

Hope this helps!

(2) Some more questions about the mmscore().

mmscore = T-score (but mind that we use genomic kinship and not pedigree kinship as a rule)

T-LRT is not implemented in GenABEL, but this is something we work together with William Astle to have in MixABEL

mmscore() is very good method, so I would not say "screening only" -- rather this is something which can be used right away

Model is described in details in Goncalo's paper (http://www.ncbi.nlm.nih.gov/pubmed/16685720) or see ProbABEL paper (http://www.biomedcentral.com/1471-2105/11/134) for extension of this approach; technical details are given in help(mmscore) and manual for ProbABEL.

In short, following my notation, the model is

phenotype* = mu + beta*SNP + G + e

with phenotype* being fixed-effects adjusted residuals from polygenic model, and parameters underlying distribution of random effects (G & e) not being estimated for each SNP, but rather taken from polygenic model

phenotype = mu + ...[extra covaruates]... + G + e

Hope this helps!

Note that (Gen)ABELs are dynamically developing; while this post is intended to provide full information at the time of posting, please read on further posts, if any, as the topic may be updated with novel solutions at a later stage.

best regards,

Yurii

best regards,

Yurii

### Who is online

Users browsing this forum: No registered users and 3 guests