Search code examples
pythondictionaryworkflowbioinformaticssnakemake

How to rename samples based on dictionary values


I have some trouble writing a snakemake rule to change the name of my samples. After demultiplexing with Porechop and some basic trimming with Filtlong, I would like to change the names of my samples from e.g. BC01_trimmed.fastq.gz to E_coli_trimmed.fastq.gz. The idea is that in my config file there is a dictionary where each sample is linked to the used barcode.

Based on this previously asked question, I wrote this piece of example code.

mydictionary = {
    'BC01': 'bacteria_A',
    'BC02': 'bacteria_B'
}

rule all:
    input:
        expand('{bacteria}_trimmed.fastq.gz', bacteria=mydictionary.values())

rule changeName:
    input:
        '{barcode}_trimmed.fastq.gz'
    params:
        value=lambda wcs: mydictionary[wcs.bacteria]
    output:
        '{params.value}_trimmed.fastq.gz'
    shell:
        "mv {input} {output}"

But I receive the error:

WildcardError in rule changeName in file Snakefile:
Wildcards in input files cannot be determined from output files:
'barcode'

Thanks in advance


Solution

  • Let's try again... I would reverse the dictionary since in your input function you want to retrieve the barcode given a sample name. (You can reverse key-values using python code, of course).

    To resolve the cyclic dependency or similar errors, I think you need to either constraint the wildcard values to the ones you have in your dictionary, i.e. you need to effectively disable the regex matching, or you can output the renamed files to a different directory. (I really like snakemake, but I find this behaviour quite confusing). I use the wildcard_constraints pattern as below quite liberally for any wildcard to avoid this issue. So:

    mydictionary = {
        'bacteria_A': 'BC01',
        'bacteria_B': 'BC02' 
    }
    
    wildcard_constraints:
        bacteria='|'.join([re.escape(x) for x in mydictionary.keys()]),
    
    rule all:
        input:
            expand('{bacteria}_trimmed.fastq.gz', bacteria=mydictionary.keys())
    
    rule changeName:
        input:
            lambda wcs: '%s_trimmed.fastq.gz' % mydictionary[wcs.bacteria],
        output:
            '{bacteria}_trimmed.fastq.gz'
        shell:
            "mv {input} {output}"