Search code examples
snakemake

Delete unwanted Snakemake Outputs


I have looked at a few other post about Snakemake and deleting unneeded data to clean up diskspace. I have designed a rule called: rule BamRemove that touches my rule all. However, my the workflow manager isnt recognizing. I am getting this error: WildcardError in line 35 of /PATH: No values given for wildcard 'SampleID'. I am not seeing why. Any help to get this to work would be nice.

sampleIDs = d.keys()


rule all:
    input:
        expand('bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam', sampleID=sampleIDs),
        expand('bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai', sampleID=sampleIDs),
        expand('logs/{SampleID}_removed.txt', sampleID=sampleIDs) #Line 35

# Some tools require unzipped fastqs
rule AnnotateUMI:
    input: 'bams/{sampleID}_unisamp_L001_001.star_rg_added.sorted.dmark.bam'
    output: 'bams/{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bam',
    # Modify each run
    params: '/data/Test/fastqs/{sampleID}_unisamp_L001_UMI.fastq.gz'
    threads: 32
    run:
         # Each user needs to set tool path
         shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar AnnotateBamWithUmis \
         -i {input} \
         -f {params} \
         -o {output}')


rule SortSam:
    input: rules.AnnotateUMI.output
    output: 'bams/{sampleID}_Qsorted.MarkUMI.bam'
    params:
    threads: 32
    run:
         # Each user needs to set tool path
         shell('java -Xmx110g -jar /data/Tools/picard.jar SortSam \
         INPUT={input} \
         OUTPUT={output} \
         SORT_ORDER=queryname')


rule MItag:
    input: rules.SortSam.output
    output: 'bams/{sampleID}_Qsorted.MarkUMI.MQ.bam'
    params:
    threads: 32
    run:
         # Each user needs to set tool path
         shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar SetMateInformation \
         -i {input} \
         -o {output}')


rule GroupUMI:
    input: rules.MItag.output
    output: 'bams/{sampleID}_grouped.Qsorted.MarkUMI.MQ.bam'
    params:
    threads: 32
    run:
         # Each user needs to set tool path
         shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar GroupReadsByUmi \
         -i {input} \
         -s adjacency \
         -e 1 \
         -m 20 \
         -o {output}')


rule ConcensusUMI:
    input: rules.GroupUMI.output
    output: 'bams/{sampleID}_concensus.Qunsorted.MarkUMI.MQ.bam'
    params:
    threads: 32
    run:
         # Each user needs to set tool path
         shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.2.jar CallMolecularConsensusReads \
         --input={input} \
         --min-reads=1 \
         --output={output}')


rule STARmap:
    input: rules.ConcensusUMI.output
    output:
        log = 'bams/{sampleID}_UMI_Concensus_Log.final.out',
        bam = 'bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam'
    params: 'bams/{sampleID}_UMI_Concensus_'
    threads: 32
    run:
         # Each user needs to genome path
         shell('STAR \
         --runThreadN {threads} \
         --readFilesIn {input} \
         --readFilesType SAM PE \
         --readFilesCommand samtools view -h \
         --genomeDir /data/reference/star/STAR_hg19_v2.7.5c \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --limitBAMsortRAM 220000000000 \
         --outFileNamePrefix {params}')


rule Index:
    input: rules.STARmap.output.bam
    output: 'bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai'
    params:
    threads: 32
    run:
         shell('samtools index {input}')


rule BamRemove:
    input:
        AnnotateUMI_BAM = rules.AnnotateUMI.output,
        AnnotateUMI_BAI = '{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bai',
        SortSam = rules.SortSam.output,
        MItag = rules.MItag.output,
        GroupUMI = rules.GroupUMI.output,
        ConcensusUMI = rules.ConcensusUMI.output
    output: touch('logs/{SampleID}_removed.txt')
    threads: 32
    run:
        shell('rm {input}')

Solution

  • expand('logs/{SampleID}_removed.txt', sampleID=sampleIDs) #Line 35
                 ^^^                     ^^^
    

    The error is due to SampleID being different from sampleID, make them consistent throughout the script.