Search code examples
lambdasnakemake

using lambda function to download files in snakemake: 'This IOFile is specified as a function and may not be used directly.'


I am trying to download FASTQ files from an FTP server using snakemake which I'll post-process. Filenames are under columns "read1" and "read2" in data.tsv. When I try the following code, I get the following error:

ValueError in line 17 ...
This IOFile is specified as a function and may not be used directly.

Line 17 refers to shell. I tried googling around and the lambda function looks correct - also lambda functions are accepted in params.

Here's my code:

import pandas as pd

samples = pd.read_table("data.tsv").set_index("sample", drop=False)

rule all:
    input:
        lambda wildcards: samples.to_dict()["read1"][wildcards.sample].split('/')[-1],
        lambda wildcards: samples.to_dict()["read2"][wildcards.sample].split('/')[-1]

rule dl:
    output:
        temp(lambda wildcards: samples.to_dict()["read1"][wildcards.sample].split('/')[-1]),
        temp(lambda wildcards: samples.to_dict()["read2"][wildcards.sample].split('/')[-1])
    params:
        read1 = lambda wildcards: samples.to_dict()["read1"][wildcards.sample],
        read2 = lambda wildcards: samples.to_dict()["read2"][wildcards.sample]
    shell:
        "wget {params.read1}; wget {params.read2}"

Please help - I can't figure out what's wrong.

EDIT 1

In case it's helpful, the following code using remote files works (also suggested by euronion below):

import pandas as pd
from snakemake.remote.FTP import RemoteProvider as FTPRemoteProvider

FTP = FTPRemoteProvider()

samples = pd.read_table("data.tsv").set_index("sample", drop=False)

rule all:
    input:
        expand("results/{sample}.sam", sample = samples["sample"])

rule bwa:
    input:
        v = "data/ref.fna",
        read1 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read1']),
        read2 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read2'])
    output:
        "results/{sample}.sam"
    shell:
        "scripts/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem {input.v} {input.read1} {input.read2} > {output}"

EDIT 2 The issue with my original attempt is snakemake doesn't allow lambda functions in output. So the following minimal working example:

read1={'s1': 'test1/ERR7671976_1.fastq.gz'}
read2={'s1': 'test1/ERR7671976_2.fastq.gz'}

rule all:
    input:
        lambda wildcards: read1[wildcards.sample],
        lambda wildcards: read2[wildcards.sample]

rule test:
    output:
        lambda wildcards: read1[wildcards.sample],
        lambda wildcards: read2[wildcards.sample]
    params:
        r1 = lambda wildcards: read1[wildcards.sample],
        r2 = lambda wildcards: read2[wildcards.sample]
    shell:
        """
        touch {params.r1}
        touch {params.r2}
        """

gets "SyntaxError: Only input files can be specified as functions", while the following (user-defined output filenames):

read1={'s1': 'test1/ERR7671976_1.fastq.gz'}
read2={'s1': 'test1/ERR7671976_2.fastq.gz'}

rule all:
    input:
        expand("{sample}_1.fastq.gz", sample=read1.keys()),
        expand("{sample}_2.fastq.gz", sample=read2.keys())

rule test:
    output:
        '{sample}_1.fastq.gz',
        '{sample}_2.fastq.gz'
    params:
        r1 = lambda wildcards: read1[wildcards.sample],
        r2 = lambda wildcards: read2[wildcards.sample]
    shell:
        """
        touch {params.r1}; mv {params.r1} {wildcards.sample}_1.fastq.gz
        touch {params.r2}; mv {params.r2} {wildcards.sample}_2.fastq.gz
        """

works fine.


Solution

  • In case it's helpful, the following code using remote files works (also suggested by euronion):

    import pandas as pd
    from snakemake.remote.FTP import RemoteProvider as FTPRemoteProvider
    
    FTP = FTPRemoteProvider()
    
    samples = pd.read_table("data.tsv").set_index("sample", drop=False)
    
    rule all:
        input:
            expand("results/{sample}.sam", sample = samples["sample"])
    
    rule bwa:
        input:
            v = "data/ref.fna",
            read1 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read1']),
            read2 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read2'])
        output:
            "results/{sample}.sam"
        shell:
            "scripts/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem {input.v} {input.read1} {input.read2} > {output}"
    

    The issue with my original attempt is snakemake doesn't allow lambda functions in output (see Edit 2 above).