Search code examples
tuplesbioinformaticsnextflow

Nextflow process only working with first tuple element


I am currently developing a small Nextflow (v.22.10.6.5843) pipeline for running vcfancestralalleles.jar in several files. To do so, I decided to use a tuple. However, when I run the process it is only executed in the first element of the tuple.

Here is my channel definition:

Channel
        .fromPath(params.input_ihs)
        .splitCsv(header:true)
        .map{ row -> [ row.chromosome, file(row.path_vcf), file(row.path_genetic_map), file(row.path_ancestral), file(row.path_manifest)] }
        .set{samples_ihs}

Here is my DSL2 process definition:

publishDir "${results_dir}/ancestral_annotation", mode:"copy"

    input:
    tuple val(chromosome), path(path_vcf), path(path_genetic_map), path(path_ancestral), path(path_manifest)
    file java_script

    output:
    tuple val(chromosome), path("chr${chromosome}_aa.vcf")

    """
    samtools faidx ${path_ancestral}

    java -jar ${java_script} \
    -m ${path_manifest} \
    ${path_vcf} |\
    bcftools annotate -x '^INFO/AA' > chr${chromosome}_aa.vcf
    """
}

**Here is my workflow definition: **

workflow {
    p1 = vcf_ancestral_annotation(samples_ihs, java_script)
 }

** And here is my input_ihs file:**

chromosome,path_vcf,path_genetic_map,path_ancestral,path_manifest
21,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/21.phased.with.ref.vcf,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/chr21.b38.predicted.map,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/ancestral_fasta/ANCESTOR_for_chromosome_GRCh38_21_1_46709983_1.fa,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/manifest_annotation/manifest21
22,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/22.phased.with.ref.vcf,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/chr22.b38.predicted.map,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/ancestral_fasta/ANCESTOR_for_chromosome_GRCh38_22_1_50818468_1.fa,/mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/manifest_annotation/manifest22

At the beginning I thought it was a problem with the input_ihs file. However, If I run:

samples_ihs.view()

I can see that the tuple actually contains the two elements of the input_ihs file:

[21, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/21.phased.with.ref.vcf, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/chr21.b38.predicted.map, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/ancestral_fasta/ANCESTOR_for_chromosome_GRCh38_21_1_46709983_1.fa, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/manifest_annotation/manifest21]
[22, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/22.phased.with.ref.vcf, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/test/data/ihs_files/chr22.b38.predicted.map, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/ancestral_fasta/ANCESTOR_for_chromosome_GRCh38_22_1_50818468_1.fa, /mnt/c/Users/fernanda_miron1/Documents/pop_north_developing/nf-selection/nf_modules/manifest_annotation/manifest22]

Any help would be greatly appreciated!


Solution

  • Issues like this almost always involve the use of multiple input channels:

    When two or more channels are declared as process inputs, the process waits until there is a complete input configuration, i.e. until it receives a value from each input channel. When this condition is satisfied, the process consumes a value from each channel and launches a new task, repeating this logic until one or more channels are empty.

    You've probably used a factory method to create your java_script channel, but I think you just need to make sure it is a value channel. Note that a value channel is implicitly created by a process when it's invoked with a simple value. You could instead try the following for example:

    params.java_script = '/path/to/vcfancestralalleles.jar'
    
    ...
    
    workflow {
    
        java_script = file( params.java_script )
    
        Channel
            .fromPath( params.input_ihs )
            .splitCsv( header:true )
            .map { row ->
                [ row.chromosome, file(row.path_vcf), file(row.path_genetic_map), file(row.path_ancestral), file(row.path_manifest)]
            }
            .set { samples_ihs }
    
        p1 = vcf_ancestral_annotation(samples_ihs, java_script)
    }