palinear with mmscore - high memory use

Questions about ProbABEL are welcome here.
Forum rules
Please remember not to post any sensitive data on this public forum.
The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.
stuartmcg
Posts: 3
Joined: Tue Nov 15, 2016 3:40 pm

palinear with mmscore - high memory use

Postby stuartmcg » Tue Nov 15, 2016 5:06 pm

We are running into memory issues with palinear while using the --mmscore option.

Our cohort size (held in file vector format) is 6k and the phenotype file we are running has non NA values for 800 of them (so only 800 selected for this run). We are seeing memory usage in excess of 50GB for a single chromosome with 5M SNPs. The manual for ProbABEL details high memory use, but for ~ 5GB at 2.5M SNPs. Are we consuming much more than expected ?

I am currently testing within Valgrind to determine if there are any memory leaks (nothing reported so far). The memory consumption does seem to increment over time, which I did not expect, as my understanding was that the SNP data was being processed on a row by row basis. Is my understanding correct ?


Regards,
Stuart.


Here's the command line with version information:

Code: Select all

 palinear -p GP8_mmscore_pheno_minimal_PCS.PHE -i /chr8_sorted.mlinfo -
probabel v. 0.5.0
(C) Yurii Aulchenko, Lennart C. Karssen, Maarten Kooyman, Maksim Struchalin, The GenABEL team, EMC Rotterdam

Using EIGEN version 3.2.1 for matrix operations
Options in effect:
         --pheno       = minimal_PCS.PHE
         --info        = chr8_sorted.mlinfo
         --dose        = chr8_sorted.prob.fvi
                     (using FVF data)
         --ntraits     = 1
         --ngpreds     = 2
         --interaction = 0
         --mmscore = minimal_PCS.dat
         --map     = not in output
         --nids    = estimated from data
         --chrom   = not in output
         --out     = chr8_mmscore_prob
         --skipd   = 2
         --separat = ","
         --flipmaf = OFF
         --score   = OFF
         --nohead  = OFF
         --allcov  = OFF
         --robust  = OFF

Reading info data...
Number of SNPs = 4961197
Reading phenotype data...
Actual number of people in phenofile = 5884; using all of these
Linear model: ( residuals ) ~ mu + SNP_A1
You are running mmscore...
 loaded InvSigma...
Reading genotype data... done.
 loaded null data... estimated null model... formed regression object...
Analysis:  2.88%...
Last edited by stuartmcg on Wed Nov 16, 2016 1:34 pm, edited 2 times in total.

lckarssen
Site Admin
Site Admin
Posts: 321
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: palinear with mmscore - high memory use

Postby lckarssen » Wed Nov 16, 2016 8:52 am

That's interesting... As you already supposed, this kind of memory consumption is not expected. 6000 individuals and 5M SNPs should work. Given that you only have phenotype data on 800 of them might (or might not) point to the problematic part of the code. In my personal experience we usually had phenotype data for most if not all individuals. Of course I don't know if this is the case for other ProbABEL users, but I could imagine that the code path for NA phenotypes is less well tested.

I'd be happy to hear what comes out of the valgrind test. We have some valgrind tests that are automatically run before release, but I'm sure they don't cover the whole code base.

Have you, by any chance, run this same analysis with an older version of ProbABEL?
-------
Lennart Karssen
PolyOmica
The Netherlands
-------

stuartmcg
Posts: 3
Joined: Tue Nov 15, 2016 3:40 pm

Re: palinear with mmscore - high memory use

Postby stuartmcg » Wed Nov 16, 2016 1:32 pm

We were originally running version 0.4.4 and experienced the issue with that version. Updating to the latest release did not make things any better. I will try creating a simulated phenotype across the full cohort to see if this changes the outcome.

lckarssen
Site Admin
Site Admin
Posts: 321
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: palinear with mmscore - high memory use

Postby lckarssen » Wed Nov 16, 2016 1:41 pm

Good idea to run a simulated phenotype. Looking forward to the outcome!
-------
Lennart Karssen
PolyOmica
The Netherlands
-------

stuartmcg
Posts: 3
Joined: Tue Nov 15, 2016 3:40 pm

Re: palinear with mmscore - high memory use

Postby stuartmcg » Mon Nov 21, 2016 6:40 pm

Just to update a little on this:

When we originally used the --mmscore option, we were running version 4.4. We noted high memory use at that point, so updated to the latest version 5.0, which also suffered from the same problem. Since the --mmscore runs we had performed were on a subset of the cohort, I switched back to the full cohort phenotype file. To my surprise, when not using the --mmscore option, the 5.0 version is now consuming 10x more memory per chromosome than version 4.4. See the attached image of the same command line being used with /usr/bin/palinear (version 4.4) and palinear (version 5.0). The older version only consumes 2GB versus 20GB+ on the latest release.

I'm looking at the source to see if I can identify the cause.

lckarssen
Site Admin
Site Admin
Posts: 321
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: palinear with mmscore - high memory use

Postby lckarssen » Tue Nov 22, 2016 9:03 am

Wow... 10× more RAM usage is definitely not good.

I have opened issue #31 on Github to track this issue.
-------
Lennart Karssen
PolyOmica
The Netherlands
-------

lckarssen
Site Admin
Site Admin
Posts: 321
Joined: Tue Jan 04, 2011 3:04 pm
Location: Utrecht, The Netherlands

Re: palinear with mmscore - high memory use

Postby lckarssen » Tue Dec 06, 2016 10:39 pm

Thanks to Maarten (see the Github issue) I now know how I missed the memory leaks in the v0.5.0 release (the Jenkins build checks had silently failed after an upgrade). In the mean time I also fixed the issue.

Do you have time to test the latest commit in the Git master branch? Or, if you haven't worked with Git/Github before, I could make you a tar.gz file. It would be great to have a test on a real dataset before I do a bugfix release.
-------
Lennart Karssen
PolyOmica
The Netherlands
-------


Return to “ProbABEL”

Who is online

Users browsing this forum: No registered users and 1 guest