Search code examples
kubernetesnextflow

Output file not created when running a R command in a Nextflow file?


I am trying to run a nextflow pipeline but the output file is not created.

The main.nf file looks like this:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2 

process my_script {

    """
    Rscript script.R
    """
}

workflow {
  my_script 
}

In my nextflow.config I have:


process {
   executor = 'k8s'
   container = 'rocker/r-ver:4.1.3'
}

The script.R looks like this:

 FUN <- readRDS("function.rds");
 input = readRDS("input.rds");
 output = FUN(
singleCell_data_input = input[[1]], savePath = input[[2]], tmpDirGC = input[[3]]
);
 saveRDS(output, "output.rds")

After running nextflow run main.nf the output.rds is not created


Solution

  • Nextflow processes are run independently and isolated from each other from inside the working directory. For your script to be able to find the required input files, these must be localized inside the process working directory. This should be done by defining an input block and declaring the files using the path qualifier, for example:

    params.function_rds = './function.rds'
    params.input_rds = './input.rds'
    
    
    process my_script {
    
        input:
        path my_function_rds
        path my_input_rds
    
        output:
        path "output.rds"
    
        """
        #!/usr/bin/env Rscript
    
        FUN <- readRDS("${my_function_rds}");
        input = readRDS("${my_input_rds}");
        output = FUN(
          singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
        );
        saveRDS(output, "output.rds")
        """
    }
    
    workflow {
    
        function_rds = file( params.function_rds )
        input_rds = file( params.input_rds )
    
        my_script( function_rds, input_rds )
        my_script.out.view()
    }
    

    In the same way, the script itself would need to be localized inside the process working directory. To avoid specifying an absolute path to your R script (which would not make your workflow portable at all), it's possible to simply embed your code, making sure to specify the Rscript shebang. This works because process scripts are not limited to Bash1.

    Another way, would be to make your Rscript executable and move it into a directory called bin in the the root directory of your project repository (i.e. the same directory as your 'main.nf' Nextflow script). Nextflow automatically adds this folder to the $PATH environment variable and your script would become automatically accessible to each of your pipeline processes. For this to work, you'd need some way to pass in the input files as command line arguments. For example:

    params.function_rds = './function.rds'
    params.input_rds = './input.rds'
    
    
    process my_script {
    
        input:
        path my_function_rds
        path my_input_rds
    
        output:
        path "output.rds"
    
        """
        script.R "${my_function_rds}" "${my_input_rds}" output.rds
        """
    }
    
    workflow {
    
        function_rds = file( params.function_rds )
        input_rds = file( params.input_rds )
    
        my_script( function_rds, input_rds )
        my_script.out.view()
    }
    

    And your R script might look like:

    #!/usr/bin/env Rscript
    
    args <- commandArgs(trailingOnly = TRUE)
    
    FUN <- readRDS(args[1]);
    input = readRDS(args[2]);
    output = FUN(
      singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
    );
    saveRDS(output, args[3])