I am trying to create a simple Snakemake workflow and I am having some issues. My file generates the following errors:
MissingInputException in line 29 of /home/agalvez/data/workflow-workshop/test/Snakefile: Missing input files for rule hmm: hmmsearch --tblout output_tblout_egf --noali -E 99
MissingInputException in line 49 of /home/agalvez/data/workflow-workshop/test/Snakefile: Missing input files for rule create_archive: output/EP00771_Trimastix_marina.out output/EP00759_Prokinetoplastina_sp_PhF-6.out
It is the first time I ever try to use Snakemake or anything related to Python so I do not understand why this is failing. Any help would be really appreciated. Thanks in advance!
ARCHIVE_FILE = 'output.tar.gz'
# a single output file
OUTPUT_FILE = 'output/{species}.out'
# a single input file
INPUT_FILE = 'proteins/{species}.fasta'
# Build the list of input files.
INP = glob_wildcards(INPUT_FILE).species
print(INP)
# The list of all output files
OUT = expand(OUTPUT_FILE, species=INP)
print(OUT)
# pseudo-rule that tries to build everything.
# Just add all the final outputs that you want built.
rule all:
input: ARCHIVE_FILE
# hmmsearch
rule hmm:
input:
cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
species=INPUT_FILE ,
hmm='hmm/EGF.hmm'
output: OUTPUT_FILE
shell: '{input.cmd} {input.hmm} {input.species} {output}'
# hmmsearch
#rule hmm:
# shell: 'hmmsearch --tblout output_tblout_egf --noali -E 99 hmm/EGF.hmm INPUT_FILE OUTPUT_FILE'
# create an archive with all results
rule create_archive:
input: OUT
output: ARCHIVE_FILE
shell: 'tar -czvf {output} {input}'
You have:
rule hmm:
input:
cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
species=INPUT_FILE ,
hmm='hmm/EGF.hmm'
output: OUTPUT_FILE
shell: '{input.cmd} {input.hmm} {input.species} {output}'
cmd
is a string containing a command, not an input file, hence the error. Maybe you want something like this:
rule hmm:
input:
species=INPUT_FILE ,
hmm='hmm/EGF.hmm'
output:
OUTPUT_FILE,
params:
cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
shell:
'{params.cmd} {input.hmm} {input.species} {output}'