Search code examples
groovyprocesspipelinedirectivenextflow

Setting the nextflow saveAs directive to save files in different directories


I am working on a nextflow pipeline where a program used in a process generates all the results inside a subfolder.

I save the output files I need within an output tuple for later processes, and I am attempting at saving the output files in different directories depending on their name, changing their name too.

I am having trouble because the files don't get saved and I can't understand why. The issue must be in the way I use saveAs, but the nextflow documentation on it is very scarce (or I haven't found it).

I have the feeling that the issue comes from the fact that the $filename contains path information and isn't just a filename. Could anyone tell me what I'm doing wrong?

Here below, I wrote a mock process that reproduces the error:

#!/usr/bin/env nextflow 

Channel
.from("foo", "bar", "faz")
.set{ Input }

process mock {

    executor "local"
    maxForks 48
    cpus 1

    publishDir "${params.output_dir}",
    mode: "copy",
    pattern: "*.tab",
    saveAs: {
        filename -> 
             if (filename.contains("A.tab")) {"A/$filename"}
        else if (filename.contains("B.tab")) {"B/$filename"}
        else if (filename.contains("C.tab")) {"C/$filename"}
        else {"unassigned/$filename"}
    }

    input:
    val name from Input

    output:
    file "test/${name}.A.tab"
    file "test/${name}.B.tab"
    file "test/${name}.C.tab"
    file "test/${name}.D.tab"
    
    script:
    """
    mkdir test &&
    unset X &&
    declare -a X=(A.tab B.tab C.tab D.tab) &&
    for FILENAME in \${X[@]}
    do
        if [ ! -f test/${name}.\${FILENAME} ]
        then
            touch test/${name}.\${FILENAME}
        fi
    done
    """
}

Solution

  • I think the problem is that you're writing files to a directory called test, but with pattern: "*.tab" the publishDir directive only selects files to publish from the top level directory. You could try changing this to pattern: "test/*.tab" or pattern: "**.tab".

    pattern

    Specifies a glob file pattern that selects which files to publish from the overall set of output files.

    An example using DSL2:

    params.output_dir = './results'
    
    
    process mock {
    
        publishDir (
            path: "${params.output_dir}/mock",
            mode: "copy",
            pattern: "test/*.tab",
            saveAs: { fn ->
                if (fn.endsWith("A.tab")) { "A/${fn}" }
                else if (fn.endsWith("B.tab")) { "B/${fn}" }
                else if (fn.endsWith("C.tab")) { "C/${fn}" }
                else { "unassigned/${fn}" }
            }
        )
    
        input:
        val name
    
        output:
        path "test/${name}.A.tab", emit: A
        path "test/${name}.B.tab", emit: B
        path "test/${name}.C.tab", emit: C
        path "test/${name}.D.tab", emit: D
    
        script:
        """
        mkdir test
        touch test/${name}.{A,B,C,D}.tab
        """
    }
    
    workflow {
    
        input_ch = Channel.of("foo", "bar", "faz")
    
        mock( input_ch )
    }
    

    Results:

    $ find results/
    results/
    results/mock
    results/mock/B
    results/mock/B/test
    results/mock/B/test/faz.B.tab
    results/mock/B/test/foo.B.tab
    results/mock/B/test/bar.B.tab
    results/mock/A
    results/mock/A/test
    results/mock/A/test/bar.A.tab
    results/mock/A/test/foo.A.tab
    results/mock/A/test/faz.A.tab
    results/mock/C
    results/mock/C/test
    results/mock/C/test/foo.C.tab
    results/mock/C/test/faz.C.tab
    results/mock/C/test/bar.C.tab
    results/mock/unassigned
    results/mock/unassigned/test
    results/mock/unassigned/test/foo.D.tab
    results/mock/unassigned/test/bar.D.tab
    results/mock/unassigned/test/faz.D.tab
    

    Note that the old DSL1 is no longer supported.