Search code examples
processnextflowpublishdir

Nextflow: how do you pass an output (multiple files) from the publishdir to the next process?


I have a process generating two files that I am interested in, hitsort.cls and contigs.fasta. I output these using publishdir:

process RUN_RE {
    publishDir "$baseDir/RE_output", mode: 'copy'
  
    input:
    file 'interleaved.fq'

    output:
    file "${params.RE_run}/seqclust/clustering/hitsort.cls"
    file "${params.RE_run}/contigs.fasta"

    script:
    """
    some_code

    """

  }

Now, I need these two files to be an input for another process but I don't know how to do that.

I have tried calling this process with

NEXT_PROCESS(params.hitsort, params.contigs)

while specifying the input as:

process NEXT_PROCESS {
  
    input:
    path hitsort
    path contigs

but it's not working, because only the basename is used instead of the full path. Basically what I want is to wait for RUN_RE to finish, and then use the two files it outputs for the next process.


Solution

  • Best to avoid accessing files in the publishDir, since:

    Files are copied into the specified directory in an asynchronous manner, thus they may not be immediately available in the published directory at the end of the process execution. For this reason files published by a process must not be accessed by other downstream processes.

    The recommendation is therefore to ensure your processes only access files in the working directory, (i.e. ./work). What this means is: it's best to avoid things like absolute paths in your input and output declarations. This will also help ensure your workflows are portable.

    nextflow.enable.dsl=2
    
    params.interleaved_fq = './path/to/interleaved.fq'
    params.publish_dir = './results'
    
    process RUN_RE {
    
        publishDir "${params.publish_dir}/RE_output", mode: 'copy'
    
        input:
        path interleaved
    
        output:
        path "./seqclust/clustering/hitsort.cls", emit: hitsort_cls
        path "./contigs.fasta", emit: contigs_fasta
    
        """
        # do something with ${interleaved}...
        ls -l "${interleaved}"
    
        # create some outputs...
        mkdir -p ./seqclust/clustering
        touch ./seqclust/clustering/hitsort.cls
        touch ./contigs.fasta
        """
    }
    
    process NEXT_PROCESS {
    
        input:
        path hitsort
        path contigs
    
        """
        ls -l
        """
    }
    
    workflow {
    
        interleaved_fq = file( params.interleaved_fq )
    
        NEXT_PROCESS( RUN_RE( interleaved_fq ) )
    }
    

    The above workflow block is effectively the same as:

    workflow {
    
        interleaved_fq = file( params.interleaved_fq )
    
        RUN_RE( interleaved_fq )
    
        NEXT_PROCESS( RUN_RE.out.hitsort_cls, RUN_RE.out.contigs_fasta )
    }