How to declare a new Nextflow variable in a shell block?

I am currently writing my first Nextflow pipeline and I try to declare a new Nextflow variable in a script but I don't manage doing it.

I would like to set up a variable, min_length, with a value read in a text file (parsed with awk) and use this value later in my pipeline as a parameter. This is what I've tried :

process get_min_len{

  input:
  file "./foo.tab"

  output:
  val length_min into min_len_channel

  shell:
  """
  !{length_min}=`awk '{if (\$1=="!{params.bar}") {print \$2}}' ./foo.tab`
  """
}

I get this error message :

Error executing process > 'get_min_max_len'

Caused by:
  No such variable: length_min

(I also tried to initialize min_length like so : min_length=0 but it does not work either.)

Is there a way to do that ? Thanks !

Solution

You can use the env qualifier to capture a shell variable. For example:

params.foo = "foo.tab"
params.bar = "bar"

foo = file( params.foo )


process get_min_len{

  input:
  path foo

  output:
  env length_min into min_len_channel

  shell:
  '''
  length_min="$(awk '$1 == "!{params.bar}" { print $2 }' "!{foo}")"
  '''
}

However, defining a shell variable yourself and then capturing it won't avoid the creation of a file. The env qualifier just adds some syntactic-sugar to your shell script at runtime, such that an output file is still created. Using the example above, I get:

$ cat work/d4/37ad3bea12cb64089196744b6558bb/.command.sh 
#!/bin/bash -ue
length_min="$(awk '$1 == "bar" { print $2 }' "foo.tab")"

# capture process environment
set +u
echo length_min=$length_min > .command.env

A better way therefore is to just write the value to a file yourself and have Nextflow read from the output channel to get the value. You can just use the map operator for this:

process get_min_len{

  input:
  path foo

  output:
  path "length_min.txt" into min_len_channel

  shell:
  '''
  awk '$1 == "!{params.bar}" { print $2 }' "!{foo}" > "length_min.txt"
  '''
}

min_len_channel.map { it.text.strip() }.view()

Once you've read the contents of the file, you can call strip() to remove whitespace (spaces, new lines etc.) from the beginning and end of the string. Alternatively, if your variable might need to start or end with additional whitespace it might be better to AWK 'printf' your string to avoid the newline character in the first place.

Generally speaking, I would avoid a separate process like this unless the file you are parsing is large. If your input file is just some simple configuration file, you could potentially get away with something like:

foo = file( params.foo )

Channel
    .from( foo.text )
    .splitCsv(sep: '\t')
    .filter { col1, col2 -> col1 == params.bar }
    .map { col1, col2 -> col2 }
    .view()