Search code examples
wildcardsnakemake

Snakemake InputFunctionException. AttributeError: 'Wildcards' object has no attribute


I have a list object with ChIP-seq single-end fastq file names allfiles=['/path/file1.fastq','/path/file2.fastq','/path/file3.fastq'] . I'm trying to set that object, allfiles, as a wildcard (I want the input of the fastqc rule (and others such as mapping, but let's keep it simple). I tried what is seen in the code below (lambda wildcards: data.loc[(wildcards.sample),'read1']). This, however, is giving me the error

"InputFunctionException in line 118 of Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
" 

Does someone know exactly how to define it? It seems I am close, I get the general idea but I am failing to get the syntax correct and execute it. Thank you !

Code:

import pandas as pd
import numpy as np

# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # three columns: sample ID , /path/to/chipseq_file_SE.fastq , /path/to/chipseq_input.fastq
outputDir = config['outputdir'] # output directory

outDir = outputDir + "/MyExperiment"
qcDir = outDir + "/QC"

# Read in the samples table
data = pd.read_csv(sampleFile, header=0, names=['sample', 'read1', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample IDs
read1 = data['read1'].unique().tolist() # ChIP-treatment file single-end file
inplist= data['inputs'].unique().tolist() # the ChIP-input files
inplistUni= data['inputs'].unique().tolist() # the ChIP-input files (unique)
allfiles = read1 + inplistUni

# Target rule
rule all:
    input:
        expand(f'{qcDir}' + '/raw/{sample}_fastqc.html', sample=samples),
        expand(f'{qcDir}' + '/raw/{sample}_fastqc.zip', sample=samples),

# fastqc report generation
rule fastqc:
    input: lambda wildcards: data.loc[(wildcards.sample), 'read1']
    output:
        html=expand(f'{qcDir}' + '/raw/{sample}_fastqc.html',sample=samples) ,
        zip=expand(f'{qcDir}' + '/raw/{sample}_fastqc.zip',sample=samples)
    log: expand(f'{logDir}' + '/qc/{sample}_fastqc_raw.log',sample=samples)
    threads: 4
    wrapper: "fastqc {input} 2>> {log}"

Solution

  • Currently output files of rule fastqc doesn't have any wildcards once they are resolved. That is, there is currently one job in the snakefile where rule fastqc tries to produce one output file for all samples.

    However, it appears you would like to run rule fastqc separately for each sample. In that case, it needs to be generalized as below, where {sample} is the wildcard:

    rule fastqc:
        input: lambda wildcards: data.loc[(wildcards.sample), 'read1']
        output:
            html = qcDir + '/raw/{sample}_fastqc.html,
            zip=qcDir + '/raw/{sample}_fastqc.zip'
        log: logDir + '/qc/{sample}_fastqc_raw.log'
        threads: 4
        shell: "fastqc {input} 2>> {log}"