Search code examples
memorysnakemake

Snakemake limit the memory usage of jobs


I need to run 20 genomes with a snakemake. So I am using basic steps like alignment, markduplicates, realignment, basecall recalibration and so on in snakemake. The machine I am using has up to 40 virtual cores and 70G memory and I run the program like this.

snakemake -s Snakefile -j 40

This works fine, but as soon as It runs markduplicates along other programs, it stops as I think it overloads the 70 available giga and crashes. Is there a way to set in snakemake the memory limit to 60G in total for all programs running? I would like snakemake runs less jobs in order to stay under 60giga, is some of the steps require a lot of memory. The command line below crashed as well and used more memorya than allocated.

snakemake -s Snakefile -j 40 --resources mem_mb=60000

Solution

  • It's not enough to specify --resources mem_mb=60000 on the command line, you need also to specify mem_mb for the rules you want to keep in check. E.g.:

    rule markdups:
        input: ...
        ouptut: ...
        resources:
            mem_mb= 20000
        shell: ...
    
    rule sort:
        input: ...
        ouptut: ...
        resources:
            mem_mb= 1000
        shell: ...
    

    This will submit jobs in such way that you don't exceed a total of 60GB at any one time. E.g. this will keep running at most 3 markdups jobs, or 2 markdups jobs and 20 sort jobs, or 60 sort jobs.

    Rules without mem_mb will not be counted towards memory usage, which is probably ok for rules that e.g. copy files and do not need much memory.

    How much to assign to each rule is mostly up to your guess. top and htop commands help in monitoring jobs and figuring out how much memory they need. More elaborated solutions could be devised but I'm not sure it's worth it... If you use a job scheduler like slurm, the log files should give you the peak memory usage of each job so you can use them for future guidance. Maybe others have better suggestions.