Search code examples
groovyworkflownextflow

Nextflow flatmap not executing as expected


I have a dictionary "data". And I need to parse a key, value pair to a process. The end result should look like:

carcode=somename
params={minimum=3000, ignore=60, maximum_A=2500, maximum_B=500}
carcode=somename2
params={minimum=5000, ignore=100, maximum_A=3500, maximum_B=22500}

I have written this code and it works with a hardcoded value and not with the variable "it" I'll point to it in the code.

data = [
    "a" : "A",
    "b" : "B",
    "c" : [
        "somename":[
            "z" : "Z",
            "y" : "Y",
            "params" :[
                "minimum": "3000",
                "ignore": "60",
                "maximum_A": "2500",
                "maximum_B": "500"
            ]
        ],

        "somename2":[
            "z" : "Z",
            "y" : "Y",
            "params" :[
                "minimum": "5000",
                "ignore": "100",
                "maximum_A": "3500",
                "maximum_B": "22500"
            ]
        ]
    ]
]

carcodes = Channel.from(data.c.keySet())
transform_carcodes = carcodes.flatMap { it ->  [it] }
//HERE
results = transform_carcodes.flatMap { it ->  [barcode: it, params: data.c."somename".params] }
//HERE
results.subscribe onNext: { println it }

Currently the output gets the proper keys but uses the value of the hardcoded key:

carcode=somename
params={minimum=3000, ignore=60, maximum_A=2500, maximum_B=500}
carcode=somename2
params={minimum=3000, ignore=60, maximum_A=2500, maximum_B=500}

Why doesn't it work when I do params: data.c.it.params?

I get the output: Cannot get property 'params' on null object

I have tried toString(it)

Also once I get the output, how can I pass this k/v pair to a process, and spawn a new process for each k/v pair?

process{
    container "python:3"

    script:
    """
    python3 some_file.py <key> <value>
    """
}

When run this process should spawn:

python3 some_file.py somename {minimum=3000, ignore=60, maximum_A=2500, maximum_B=500}
python3 some_file.py somename2 {minimum=3000, ignore=60, maximum_A=2500, maximum_B=500}

Solution

  • In Nextflow, this is how I managed to handle this issue:

    The trick is to use c[it] and not c.it

    carcodes = Channel.from(data.c.keySet())
    transform_carcodes = carcodes.flatMap { it ->  [it] }
    results = transform_carcodes.flatMap { it ->  [ [it, data.c[it].params] ] }
    
    process A{
        echo true
    
        input:
        set x,y from results
    
        script:
        """
        python3.7 run_me.py ${x} \'${y}\'
        """
    
    }
    

    run_me.py

    import sys
    
    print("First:")
    print(sys.argv[1])
    print("Second:")
    print(sys.argv[2])
    

    And the output:

    [15/aec54c] process > A (1) [100%] 2 of 2 ✔
    First:
    somename
    Second:
    [minimum=3000, ignore=60, maximum_A=2500, maximum_B=500]
    
    First:
    somename2
    Second:
    [minimum=5000, ignore=100, maximum_A=3500, maximum_B=22500]