Search code examples
nextflow

Nextflow process: how to define an output path that is not explicitly referenced in the script block?


I want to write a nextflow process that looks something like this:

process my_process{

    input:
        path x

    output:
        path 'a.txt'
        path 'b.bam'
        path 'c.fasta'

    script:
        """
        some-command --in x
        """
}

where some-command is an external program (not under my control) that creates the files a.txt, b.bam and c.fasta and places in them in the same directory as the input file x.

I understand that the above process, as written, may not work. I am just not sure what the 'proper way' is to handle this. All examples of nextflow processes that I found online presumed that paths in the output block (e.g. a.txt) are explictly referenced in the script block.


Solution

  • There's nothing wrong with declaring file outputs in this way. What you're doing is just telling Nextflow to expect the three output files (a.txt, b.bam, and c.fasta) to be present in the working directory immediately after the script (which executes some-command) completes with a non-zero exit status. You do not need to explicitly reference them in your script block.

    However, if your process receives a queue channel, working with the process outputs might get a bit tricky, since all of the filenames (i.e. the basenames) in each of the three output channels will be the same. You might instead prefer to receive an input tuple, which would give you a key to join the outputs on later, for example:

    process my_process{
    
        tag { sample }
    
        input:
        tuple val(sample), path(x)
    
        output:
        tuple val(sample), path('a.txt')
        tuple val(sample), path('b.bam')
        tuple val(sample), path('c.fasta')
    
        script:
        """
        some-command --in "${x}"
        """
    }
    

    With this approach, you might even decide that the best solution is to rename your output files. Whether or not this makes sense depends on your exact requirements. With this approach, you might also decide a single output channel makes the most sense, for example:

    process my_process{
    
        tag { sample }
    
        input:
        tuple val(sample), path(x)
    
        output:
        tuple val(sample), path("${sample}.{txt,bam,fasta}")
    
        script:
        """
        some-command --in "${x}"
    
        mv 'a.txt' "${sample}.txt"
        mv 'b.bam' "${sample}.bam"
        mv 'c.fasta' "${sample}.fasta"
        """
    }
    

    If your input file is a .txt, .bam or .fasta file, you might also prefer to stage it down one directory, so that it's not accidentally clobbered by some-command when it runs:

    process my_process{
    
        input:
        path x, stageAs: "dir/*"
    
        script:
        """
        some-command --in "${x}"
        """
    }