Search code examples
inputoutputpipelinenextflow

Manipulating with input/output jacks in Nextflow


I'm a newbie Nexflow user. And I'm struggling to familiarize input/output jacks in Nexflow. I knew that Nextflow has DAG visualisation, a useful feature for drawing a directed chart for flow.

I have a silly small chart like this. enter image description here

I want to write a Nextflow file for the upper pipeline. Especially, I expect that the outputs of process A can be jacked on processes B and C in a particular way. Outputs name be shown off in the output flowchart (when run with tag -with-dag).

If someone helps me, I'll very much appreciate it. Thanks.

Complement

This is my script. At my level, I just only can use path as my output. This leads my script more verbose because of the paths of the file. Above all, when using the draw chart feature, the output isn't clear as I expected like the initial flowchart.

#!/usr/bin/env nextflow

params.input_text = "abc"

process A{
    input:
    val text
    
    output:
    path A_folder
    
    """
    mkdir A_folder
    string=$text 
    for element in \$(seq 0 \$((\${#string}-1)))
    do
        echo \${string:\$element:1} > A_folder/\$element.txt        
    done
    """
}

process B{
    input:
    path A_folder
    
    output:
    path B_folder
    
    """
    mkdir B_folder
    echo \$(cat $A_folder/0.txt)\$(cat $A_folder/1.txt) > B_folder/glue1.txt  
    """
}

process C{
    input:
    path A_folder
    
    output:
    path C_folder
    
    """
    mkdir C_folder
    echo \$(cat $A_folder/2.txt | sed 's/c/3/g') > C_folder/tras.txt
    """
}

process D{
    input:
    path B_folder
    path C_folder
    
    output:
    path D_folder
    
    """
    mkdir D_folder
    echo \$(cat $C_folder/tras.txt)\$(cat $B_folder/glue1.txt) > D_folder/glue2.txt
    """
}

workflow{
    process_A = A(params.input_text)
    process_B = B(process_A)
    process_C = C(process_A)
    process_D = D(process_B, process_C)
}

enter image description here

Summarily, my question is "After writing the code and running the script (nextflow run script.nf -with-dag flow.png) . How to get the flowchart as similar to the first chart as possible?"


Solution

  • As of version 22.04.0, Nextflow can do DAG visualisation using the Mermaid renderer. All you need to do is change the output file extension to mmd, for example:

    nextflow run main.nf -with-dag flow.mmd
    

    And we can simplify the workflow a bit by using native-execution and to get close to the desired result:

    params.input_text = "abc"
    
    process process_A {
    
        input:
        val text
    
        output:
        val a, emit: foo
        val b, emit: bar
        val c, emit: baz
    
        exec:
        (a, b, c) = text.collect()
    }
    
    process process_B {
    
        input:
        val x
        val y
    
        output:
        val z
    
        exec:
        z = x + y
    }
    
    process process_C {
    
        input:
        val a
    
        output:
        val b
    
        exec:
        b = a.replaceAll('c', '3')
    }
    
    process process_D {
    
        input:
        val one
        val two
    
        output:
        val three
    
        exec:
        three = two + one
    }
    
    workflow {
    
        entry_input = Channel.of( params.input_text )
    
        (output_1, output_2, output_3) = process_A(entry_input)
    
        (output_4) = process_B( output_1, output_2 )
        (output_5) = process_C( output_3 )
    
        (final_output) = process_D( output_4, output_5 )
    
        final_output.view()
    }
    

    Results:

    $ nextflow run main.nf -with-dag flow.mmd
    N E X T F L O W  ~  version 23.04.1
    Launching `main.nf` [distraught_lorenz] DSL2 - revision: a1f4411ded
    executor >  local (4)
    [a7/e97d7c] process > process_A (1) [100%] 1 of 1 ✔
    [53/317d41] process > process_B (1) [100%] 1 of 1 ✔
    [b2/88be6d] process > process_C (1) [100%] 1 of 1 ✔
    [39/38f318] process > process_D (1) [100%] 1 of 1 ✔
    3ab
    
    $ cat flow.mmd 
    flowchart TD
        p0((Channel.of))
        p1[process_A]
        p2[process_B]
        p3[process_C]
        p4[process_D]
        p5([view])
        p6(( ))
        p0 -->|entry_input| p1
        p1 -->|output_1| p2
        p1 -->|output_2| p2
        p1 -->|output_3| p3
        p2 -->|output_4| p4
        p3 -->|output_5| p4
        p4 -->|final_output| p5
        p5 --> p6
    

    We can then produce an image with the Mermaid Live Editor and the 'default' theme:

    mermaid diagram

    Additional thoughts:

    Using parentheses around the channel declarations in the workflow block seems to prevent it from using the output channel names defined in the process blocks. Under the old DSL, the output of process_D (val three) was just shorthand for val three into three. Under DSL2, it appears the output channels still get named the same way but of course we no longer need the into keyword.