Search code examples
shellsnakemake

Snakemake rule to write a new text file from input variables (Snakemake syntax)


I've got a fully functional Snakemake workflow, but I'd like to add a rule where the input variables are written out as new lines in a newly generated output text file. To briefly summarize, I've included relevant code below:

OUTPUTDIR = config["outputDIR"] 
SAMPLEID = list(SAMPLE_TABLE.Sample_Name)
# Above 2 lines are functional in other parts of script.

rule all:
  input:
    manifest = OUTPUTDIR + "/manifest.txt"

rule write_manifest:
  input:
    sampleid = SAMPLEID,
    loc_r1 = expand("{base}/trimmed/{sample}_1.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST),
    loc_r2 = expand("{base}/trimmed/{sample}_2.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST)
  output:
    OUTPUTDIR + "/manifest.txt"
  shell:
    """
    echo "{input.sampleid},{input.loc_r1},forward" >> {output}
    echo "{input.sampleid},{input.loc_r2},reverse" >> {output}
    """

My issue is that Snakemake is reading in files, and I need it to print the file path or sample id that is it detecting instead. Help with syntax?

Desired output file needs to look like this:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse

Trying to write this using echo.

Error message:

Building DAG of jobs...
MissingInputException in [write_manifest]:
Missing input files for rule write_manifest:
sample1
sample2
sample3

UPDATE: by adding sampleid to params:

rule write_manifest:
  input:
    loc_r1 = expand("{base}/trimmed/{sample}_{suf}_1.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
    loc_r2 = expand("{base}/trimmed/{sample}_{suf}_2.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
  output:
    OUTPUTDIR + "/manifest.txt"
  params:
    sampleid = SAMPLEID
  shell:
    """
    echo "{params.sampleid},{input.loc_r1},forward" >> {output}
    echo "{params.sampleid},{input.loc_r2},reverse" >> {output}
    """

My output looked like this (which is incorrect)

sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,forward
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,reverse

This is still not what I want, I need it to look like the below desired output. Can I write it so Snakemake loops through each sample/input/params? Desired output file needs to look like this:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse

Solution

  • You need to use wildcard sample in params instead of variable SAMPLEID. This will use proper sample id specific for that rule when executed.

    params:
        sample = '{sample}'
    shell:
        """
        echo "{params.sample},{input.loc_r1},forward" >> {output}
        echo "{params.sample},{input.loc_r2},reverse" >> {output}
        """