Search code examples
dockernextflow

Using docker for only some processes in Nextflow


I am writing a pipeline in Nextflow, which contains multiple processes, where most of them use docker. Now I am trying to add a new process which includes only a python script to preprocess some results - no docker image needed.

However, I get the error Missing container image for process 'my_python_process'.

I define the docker images in nextflow.config as follows:

process {
    withName:process1 {
        container = 'some/image1:1.0'
    }
    withName:process2{
        container = 'some/image2:1.0'
    }
    withName:process3{
        container = 'some/image3:1.0'
    }
}

docker {
    enabled = true
}

I found a discussion, where they suggested using container = null for the process without container, but it still gives the same error, no matter what the process script contains.

Does anyone know what I'm missing please? Thank you!


Solution

  • With docker.enabled = true, Nextflow will try to run each process in a Docker container created using the specified image. You then get the error you're seeing when the container directive has not been specified for a particular process. The usual way is to just specify a 'base' or 'default' container for your workflow. You may want to choose one that comes with Python. Otherwise, Ubuntu would be a good choice in my opinion.

    Note that the withName process selector has the highest priority1.

    process {
    
        container = 'ubuntu:22.04'
    
        withName: my_python_process {
            container = 'python:3.9'
        }
    
        withName: process1 {
            container = 'some/image1:1.0'
        }
        withName: process2 {
            container = 'some/image2:1.0'
        }
        withName: process3 {
            container = 'some/image3:1.0'
        }
    }
    
    docker {
        enabled = true
    }
    

    I'm not aware of a way to disable Docker execution for a particular process, but nor would you really want to2. The above approach should be preferred:

    Containerization allows you to write self-contained and truly reproducible computational pipelines, by packaging the binary dependencies of a script into a standard and portable format that can be executed on any platform that supports a container runtime. Furthermore, the same pipeline can be transparently executed with any of the supported container runtimes, depending on which runtimes are available in the target compute environment.