Search code examples
pythonsnakemake

snakemake interpreting full path as relative path


I am writing a snakemake to perform multiple operations. All the rules, except for the last one (mvQsubLogs) work for a test file. This last rule should move the .e and .o files produced by the qsub command (I am running snakemake on a cluster), from the directory specified with the -e and -o flag, to a directory specified in the output directive of my rule, as soon as some operations are completed (please see input directive in the rule below):

rule mvQsubLogs:
    input:
        # FastQC
        rules.fastQC.output,

        # Markduplicates
        rules.markDups.output.markDupBam,
        rules.markDups.output.markDupMetrics,

        # mosdepth
        rules.mosdepth.output.DIR,

        # editflagStat
        rules.edit_flagStat.output,

        # edit idxStats
        rules.edit_idxStats.output,

        # insertSizeMetrics
        rules.insertSizeMetrics.output.METRICS,
        rules.insertSizeMetrics.output.PDF

    output:
        directory("{sample}/logs")
    shell:
        "mkdir -p {wildcards.sample}/logs " 
        "| mv {LOGDIR}{wildcards.sample}* {output}"

A DAG with all jobs that I want to perform can be found below:

enter image description here

The command that I am using to launch the jobs to the cluster is

snakemake -p -s Snakefile_v6_ngs_bngs05b --cluster "qsub -q onlybngs05b -e {LOGD
IR} -o {LOGDIR}" -j 5 --use-conda --jobname "{wildcards.sample}.{rule}.{jobid}"

Whereas it is important to note where the .e and .o files should be produced, which for this example is LOGDIR. LOGDIR was actually retrieved from the config file (LOGDIR = config['logsOutDir'] - in the snakefile itself and logsOutDir: "/home/ngs/jobout/" - specified in the config file).

When I call the full snakemake the command that I get for the rule mvQsubLogs is :

rule mvQsubLogs:
    input: NIPT-PearlPPlasma-03-PPx_S3downSample/fastQC, NIPT-PearlPPlasma-03-PPx_S3downSample/aligned/NIPT-PearlPPlasma-03-PPx_S3downSample.sorted.markDup.bam, NIPT-PearlPPlasma-03-PPx_S3downSample/dups/NIPT-PearlPPlasma-03-PPx_S3downSample.markDups.metrics.txt, NIPT-PearlPPlasma-03-PPx_S3downSample/depth/, NIPT-PearlPPlasma-03-PPx_S3downSample/dups/NIPT-PearlPPlasma-03-PPx_S3downSample.sorted.markDup.flagstat.edited.csv, NIPT-PearlPPlasma-03-PPx_S3downSample/readsDist/NIPT-PearlPPlasma-03-PPx_S3downSample.sorted.markDup.idxstats.edited.csv, NIPT-PearlPPlasma-03-PPx_S3downSample/insertSizeDist/NIPT-PearlPPlasma-03-PPx_S3downSample_ISmetrics.txt, NIPT-PearlPPlasma-03-PPx_S3downSample/insertSizeDist/NIPT-PearlPPlasma-03-PPx_S3downSample_ISHist.pdf
    output: NIPT-PearlPPlasma-03-PPx_S3downSample/logs
    jobid: 7
    wildcards: sample=NIPT-PearlPPlasma-03-PPx_S3downSample

mkdir -p NIPT-PearlPPlasma-03-PPx_S3downSample/logs | mv /home/ngs/jobout/NIPT-PearlPPlasma-03-PPx_S3downSample* NIPT-PearlPPlasma-03-PPx_S3downSample/logs

Which does sound right to me: (after creating the directory to which the files should be moved, just to be on the safe side) I should move all files starting with NIPT-PearlPPlasma-03-PPx_S3downSample (i.e. wildcards.sample), located at /home/ngs/jobout/ to NIPT-PearlPPlasma-03-PPx_S3downSample/logs, whereas this last directory is relative to the working directory.

Having a look at the .e file generated by the mvQsubLogs rule I get:

mkdir -p NIPT-PearlPPlasma-03-PPx_S3downSample/logs | mv /home/ngs/jobout/NIPT-PearlPPlasma-03-PPx_S3downSample* NIPT-PearlPPlasma-03-PPx_S3downSample/logs
mv: target ‘NIPT-PearlPPlasma-03-PPx_S3downSample/logs’ is not a directory

Which does not make sense to me, as the output directory NIPT-PearlPPlasma-03-PPx_S3downSample/logs should have been created

I have already tried specifying the full path where the files should be moved to, though it did not work either, I got the same error.

Can anyone spot where the error in my code is?


Solution

  • Try the following:

        shell:
            "mkdir -p {output} \n" 
            "mv {LOGDIR}{wildcards.sample}* {output}/"
    

    Keep the code DRY by using output both times. That will help if you decide to change the location later.

    Replace the pipe with just a second command (the newline).

    Add a trailing slash for the second argument to move. Seems like with nested directories, that is taken as a file if the slash isn't present. E.g.

    mkdir -p test/log | mv *.out test/log
    # mv: target ‘test/log’ is not a directory
    
    mkdir -p test/log | mv *.out test/log/
    # ok