I'm new to nextflow and have been trying to create a small pipeline for some python scripts I have. However, I have encountered an issue regarding optional inputs to processes that I can't seem to figure out a workaround for. I'm also curious what best practices would be for optional inputs and parameters.
#!/usr/bin/env nextflow
params.out = ""
params.kml_1 = null
params.kml_2 = null
params.loc = ""
params.new_data_1 = false
params.new_data_2 = false
process getPolygons {
input:
tuple val(db_table), path(path_to_kml), val(new_data)
val loc
path path_to_outdir
def new_data_arg = new_data ? "--new_data" : ""
def kml_arg = (path_to_kml != null) ? "--kml $path_to_kml" : ""
script:
"""
python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
"""
}
workflow {
outdir_ch = Channel.fromPath(params.out)
location_ch = Channel.of(params.loc)
tables = [
tuple("Table1", params.kml_1 ? params.new_data_1 : null, params.new_data_1),
tuple("Table2", params.kml_2 ? params.new_data_2: null, params.new_data_2)
]
tables_ch = Channel.from(tables)
getPolygons(tables_ch, location_ch, outdir_ch)
}
The code worked prior to adding in the optional inputs. This was before I had made tables
a list of tuples in order to account for the optional parameters in getPolygons: path_to_kml and new_data, instead it was:
tables = ["Table1", "Table2"]
I keep running into the error
ERROR ~ No such variable: new_data
or ERROR ~ No such variable: path_to_kml
depending on the order of creating the variables new_data_arg and kml_arg.
Trying the tuple method is the latest thing I have done to address this issue that the program has with the optional parameters new_data and path_to_kml. I previously had them as separate inputs to getPolygons. Could the issue be with creating the variables new_data_arg and kml_arg and using them in the script instead of using new_data and path_to_kml directly? If so, I'm not really sure what the work around is because for my purposes, I need some logic applied to new_data and path_to_kml before adding this information when invoking polygon_data.py.
I have found a solution to this that utilized tuples. First the ERROR ~ No such variable
issues were due to the variables new_data_arg
and kml_arg
not being inside the script component of the process (rookie mistake).
Next, I realized that this would not iterate over the tuples, so I was able to utilize each to do so passing in the tuple as the variable tuple_info
like so, and used "" instead of null for the path_to_kml as it is a path and there could be issues with null. so this is the final workable version for my process:
process getPolygons {
input:
each tuple_info
val loc
path path_to_outdir
script:
def (db_table, path_to_kml, new_data) = tuple_info
def new_data_arg = new_data ? "--new_data" : ""
def kml_arg = (path_to_kml != "") ? "--kml $path_to_kml" : ""
"""
python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
"""
}
I also realize that I could have simplified the tables
list as theres no reason to build extra logic surrounding params.kml_1 and params.kml_2 when the initialization of the parameters handles this.
tables = [
tuple("Table1", params.kml_1, params.new_data_1),
tuple("Table2", params.kml_2, params.new_data_2)
]