Search code examples
pythoninputsnakemake

Snakemake: MissingInputException


I am trying to create a simple Snakemake workflow and I am having some issues. My file generates the following errors:

MissingInputException in line 29 of /home/agalvez/data/workflow-workshop/test/Snakefile: Missing input files for rule hmm: hmmsearch --tblout output_tblout_egf --noali -E 99

MissingInputException in line 49 of /home/agalvez/data/workflow-workshop/test/Snakefile: Missing input files for rule create_archive: output/EP00771_Trimastix_marina.out output/EP00759_Prokinetoplastina_sp_PhF-6.out

It is the first time I ever try to use Snakemake or anything related to Python so I do not understand why this is failing. Any help would be really appreciated. Thanks in advance!

ARCHIVE_FILE = 'output.tar.gz'

# a single output file
OUTPUT_FILE = 'output/{species}.out'

# a single input file
INPUT_FILE = 'proteins/{species}.fasta'

# Build the list of input files.
INP = glob_wildcards(INPUT_FILE).species
print(INP)

# The list of all output files
OUT = expand(OUTPUT_FILE, species=INP)
print(OUT)

# pseudo-rule that tries to build everything.
# Just add all the final outputs that you want built.
rule all:
    input: ARCHIVE_FILE

# hmmsearch
rule hmm:
    input:
        cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
        species=INPUT_FILE ,
        hmm='hmm/EGF.hmm'
    output: OUTPUT_FILE
    shell: '{input.cmd} {input.hmm} {input.species} {output}'

# hmmsearch
#rule hmm:
#     shell: 'hmmsearch --tblout output_tblout_egf --noali -E 99 hmm/EGF.hmm INPUT_FILE OUTPUT_FILE'

# create an archive with all results
rule create_archive:
    input: OUT
    output: ARCHIVE_FILE
    shell: 'tar -czvf {output} {input}'

Solution

  • You have:

    rule hmm:
        input:
            cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
            species=INPUT_FILE ,
            hmm='hmm/EGF.hmm'
        output: OUTPUT_FILE
        shell: '{input.cmd} {input.hmm} {input.species} {output}'
    

    cmd is a string containing a command, not an input file, hence the error. Maybe you want something like this:

    rule hmm:
        input:
            species=INPUT_FILE ,
            hmm='hmm/EGF.hmm'
        output: 
            OUTPUT_FILE,
        params:
            cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
        shell: 
            '{params.cmd} {input.hmm} {input.species} {output}'