Search code examples
nextflow

how to use output of one proccess into another in nextflow?


I have created a process create_parallel_params whose output is parallel_params folder containing {0..9}.json. These will already created by parallel_paramgen.py.

process create_parallel_params{
    publishDir "./nf_output", mode: 'copy'
    input:
    val x
    output:
    path parallel_params
    
    script:
    """
    mkdir parallel_params | python $TOOL_FOLDERS/parallel_paramgen.py \
    parallel_params \
    $x
    """
}

I want to pass this create_parallel_params output into another process(searchlibrarysearch_molecularv2_parallelstep1) to take these json file in parallel and process it.

process searchlibrarysearch_molecularv2_parallelstep1{
    publishDir "./nf_output", mode: 'copy'
    input:
    path parallel_params
    path params.spectra
    path params.library
    output:
    file 'intermediateresults'

    script:
    """
    mkdir intermediateresults convert_binary librarysearch_binary | python $TOOL_FOLDERS/searchlibrarysearch_molecularv2_parallelstep.py \
    --parallelism 1 \
    $params.spectra \
    $parallel_params \
    $params.workflow_parameter \
    $params.library \
    intermediateresults \
    convert_binary \
    librarysearch_binary
    """
}



workflow {
    x =Channel.from(1)
    ch=create_parallel_params(x)
    searchlibrarysearch_molecularv2_parallelstep1(ch,params.spectra,params.library)
}

I need example how it could be done in nexflow.


Solution

  • I have created a process create_parallel_params whose output is parallel_params folder containing {0..9}.json.

    I want to pass this create_parallel_params output into another process to take these json file in parallel and process it.

    There's lots of ways to do this. If the parallel step just needs only a single JSON (and not the whole 'parallel_params' folder), the simplest thing to do might to just be to declare the multiple output files using a glob pattern. Then use the flatten operator to emit each file separately. For example:

    params.spectra = './spectra'
    params.library = './library'
    
    params.workflow_parameter = 'foo'
    
    process create_step {
    
        output:
        path "parallel_params/*.json"
    
        """
        mkdir parallel_params
        touch parallel_params/{0..9}.json
        """
    }
    
    process parallel_step {
        tag { json_file }
    
        input:
        path json_file, stageAs: 'parallel_params/*'
        path spectra
        path library
    
        output:
        path 'intermediate_results'
    
        """
        mkdir intermediate_results
        echo searchlibrarysearch_molecularv2_parallelstep.py \\
            --parallelism 1 \\
            "${spectra}" \\
            "${json_file}" \\
            "${params.workflow_parameter}" \\
            "${library}" \\
            intermediate_results
        """
    }
    
    workflow {
    
        spectra = file(params.spectra)
        library = file(params.library)
    
        create_step()
    
        parallel_step(
            create_step.out.flatten(),
            spectra,
            library,
        )
    }
    

    Results:

    $ nextflow run main.nf 
    N E X T F L O W  ~  version 22.04.4
    Launching `main.nf` [desperate_murdock] DSL2 - revision: e94e1ddf9b
    executor >  local (11)
    [d3/839bb9] process > create_step                            [100%] 1 of 1 ✔
    [02/34437b] process > parallel_step (parallel_params/5.json) [100%] 10 of 10 ✔
    
    

    Note also that third party scripts can be granted executable permission and moved into a directory called 'bin' in the root directory of your project repository, i.e. the same directory as your 'main.nf' Nextflow script. Nextflow automatically adds this folder to the $PATH environment variable and your scripts will become automatically accessible to each of your pipeline processes. In this case, this lets you call your scripts without having to specify the python intepreter or an absolute path using your $TOOL_FOLDERS variable.