I've multiple processes, I take output of one as an input for another. I'm unable to understand when tuple
works and when it doesn't.
main.nf
params.outdir_fastp="./trimmed"
params.outdir_index="./bwa_index"
params.rawFiles = "/Users/username/Downloads/tiny/tumor/*_R{1,2}_xxx.fastq.gz"
params.outdir_bwa_mem="./bwamem"
params.hg38genome ="/Users/username/Downloads/NM.fasta"
params.gatk_mark_duplicates="./gatk_mark_duplicates"
include { FASTP} from './fastp_process.nf'
include {bwa_index} from './index_process.nf'
include { align_bwa_mem} from './bwamem_process.nf'
include { gatk_markduplicates} from './gatk_markduplicates_process.nf'
workflow {
read_pairs_ch = Channel.fromFilePairs( params.rawFiles )
FASTP(read_pairs_ch)
bwa_index(params.hg38genome)
align_bwa_mem(FASTP.out.reads,bwa_index.out)
gatk_markduplicates(align_bwa_mem.out.sorted_bams)
}
align_bwa_mem
process align_bwa_mem {
tag {sample_id}
debug true
publishDir params.outdir_bwa_mem , mode: "copy"
input :
tuple val(sample_id), path(reads)
tuple val(idxbase), path("bwa_index/*")
output:
path("${sample_id}.sorted.bam"), emit : sorted_bams
script:
def (fq1,fq2)=reads
rg="\'@RG\\tID:${sample_id}\\tSM:${sample_id}\\tPL:illumina\'"
"""
bwa mem -M -R $rg -v 1 "bwa_index/${idxbase}" $fq1 $fq2 | samtools sort -O bam -T - > ${sample_id}.sorted.bam > ${sample_id}.bam
"""
}
mark_picard_duplicates.nf
process gatk_markduplicates {
tag {sample_id}
debug true
publishDir params.gatk_mark_duplicates , mode:"copy"
input:
tuple val(sample_id), path(x) // works fine with only `val(sample_id)`
output:
stdout
script:
"""
echo "$x\n"
echo "$sample_id\n"
"""
}
Error I get is:
WARN: Input tuple does not match input set cardinality declared by process `gatk_markduplicates` -- offending value: /Users/username/Documents/name/nextflow_scripts/pipeline/work/2a/b24c4f7fe752389dda1085dace6509/tiny_t_L007.sorted.bam
ERROR ~ Error executing process > 'gatk_markduplicates (1)'
Caused by:
Not a valid path value type: org.codehaus.groovy.runtime.NullObject (null)
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
It works fine with val(sample_id)
or path(sample_id)
instead of tuple val(sample_id), path(sorted.bam)
I'd like to capture sample_id name and the path of the sorted_bam file generated in the previous step.
I think tuple should have worked seamlessly here to get sample_id and path of the input file. I cannot understand the cause that fails it.
The problem is that align_bwa_mem
declares only a channel of paths, but gatk_markduplicates
is expecting a channel of tuples. In order to feed the sorted_bams
channel directly into gatk_markduplicates
, we can simply have align_bwa_mem
produce a tuple
:
output:
tuple val(sample_id), path("${sample_id}.sorted.bam"), emit : sorted_bams
Otherwise, the other way would be to use the map
operator, for example, to produce the required tuples:
workflow {
...
align_bwa_mem.out.sorted_bams \
| map { tuple( it.getBaseName(2), it ) } \
| gatk_markduplicates
}
The first approach should be preferred.