I've got a Nextflow process that looks like:
process my_app {
publishDir "${outdir}/my_app", mode: params.publish_dir_mode
input:
path input_bam
path input_bai
val output_bam
val max_mem
val threads
val container_home
val outdir
output:
tuple env(output_prefix), path("${output_bam}"), path("${output_bam}.bai"), emit: tuple_ch
shell:
'''
my_script.sh \
!{input_bam} \
!{output_bam} \
!{max_mem} \
!{threads}
output_prefix=$(echo !{output_bam} | sed "s#.bam##")
'''
}
This process is only creating two .bam .bai
files but my_script.sh
is also creating other .vcf
that are not being published in the output directory.
I tried it by doing in order to retrieve the files created by the script but without success:
output:
tuple env(output_prefix), path("${output_bam}"), path("${output_bam}.bai"), path("${output_prefix}.*.vcf"), emit: mt_validation_simulation_tuple_ch
but in logs I can see:
Error executing process caused by:
Missing output file(s) `null.*.vcf` expected by process `my_app_wf:my_app`
What I am missing? Could you help me? Thank you in advance!
The problem is that the output_prefix has only been defined inside of the shell block. If all you need for your output prefix is the file's basename (without extension), you can just use a regular script block to check file attributes. Note that variables defined in the script block (but outside the command string) are global (within the process scope) unless they're defined using the def
keyword:
process my_app {
...
output:
tuple val(output_prefix), path("${output_bam}{,.bai}"), path("${output_prefix}.*.vcf")
script:
output_prefix = output_bam.baseName
"""
my_script.sh \\
"${input_bam}" \\
"${output_bam}" \\
"${max_mem}" \\
"${threads}"
"""
}
If the process creates the BAM (and index) it might even be possible to refactor away the multiple input channels if an output prefix can be supplied up front. Usually this makes more sense, but I don't have enough details to say one way or the other. The following might suffice as an example; you may need/prefer to combine/change the output declaration(s) to suit, but hopefully you get the idea:
params.publish_dir = './results'
params.publish_mode = 'copy'
process my_app {
publishDir "${params.publish_dir}/my_app", mode: params.publish_mode
cpus 1
memory 1.GB
input:
tuple val(prefix), path(indexed_bam)
output:
tuple val(prefix), path("${prefix}.bam{,.bai}"), emit: bam_files
tuple val(prefix), path("${prefix}.*.vcf"), emit: vcf_files
"""
my_script.sh \\
"${indexed_bam.first()}" \\
"${prefix}.bam" \\
"${task.memory.toGiga()}G" \\
"${task.cpus}"
"""
}
Note that the indexed_bam expects a tuple in the form: tuple(bam, bai)