Search code examples
bashmatlabslurmgnu-parallel

GNU Parallel not passing strings to MATLAB


I'm trying to use GNU parallel to run a set of experiments using MATLAB on our supercomputer which uses SLURM. I have a text file containing combinations of 4 parameters that are read in and passed to a MATLAB function. That text file is called gnu_parameters.txt and and has 4 columns separated by a single space.

fs_method data_name use_vars 1
fs_method1 data_name use_vars 1
fs_method3 data_name use_vars 1 

where parameters in columns 1-3 should be read in as a string, and parameter 4 is a number.

I want to run each combination of parameters in parallel to speed up the process. My SLURM script is below, but when I tell GNU-parallel where to put each parameter using the notation {1} {2} {3} {4}, I get an error that MATLAB doesn't recognize the variable fs_method. Looking at the log tells me that the error means fs_method isn't read as a string by MATLAB. To fix that, I tried adding single quotes in the SLURM script like so:

#!/bin/bash -l
#SBATCH --time=4-00:00:00
#SBATCH --ntasks=1
#SBATCH --mem=1200g
#SBATCH --tmp=500g
#SBATCH --cpus-per-task=115
#SBATCH --mail-type=FAIL,END
#SBATCH --mail-user=myemail
#SBATCH -p groupPartition
cd $WRK_DIR
module load matlab
module load parallel
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
echo $JOBS_PER_NODE
cat gnu_parameters.txt | parallel --jobs $JOBS_PER_NODE --joblog tasklog.log --progress --colsep ' ' 'matlab -nodisplay -r "run_holdout_parallel('{1}', '{2}', '{3}', {4});exit" ' 

Below are excerpts from the log file, the error file, and the output file.

Log

Seq Host    Starttime   JobRuntime  Send    Receive Exitval Signal  Command
1   :   1719498346.300      14.911  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method, data_name, use_vars, 1);exit" 
2   :   1719498361.751      14.387  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method1, data_name, use_vars, 1);exit" 
3   :   1719498376.666      14.385  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method3, data_name, use_vars, 1);exit" 

Error File

local:1/0/100%/0.0s sh: /dev/tty: No such device or address

local:1/0/100%/0.0s sh: /dev/tty: No such device or address

local:1/0/100%/0.0s {Unrecognized function or variable 'fs_method'.
}

local:0/1/100%/15.0s 

Output file

                            < M A T L A B (R) >
                  Copyright 1984-2023 The MathWorks, Inc.
             R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
                              January 30, 2024

 
To get started, type doc.
For product information, visit www.mathworks.com.
 

                            < M A T L A B (R) >
                  Copyright 1984-2023 The MathWorks, Inc.
             R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
                              January 30, 2024

 
To get started, type doc.
For product information, visit www.mathworks.com.

But that returns the same error. How can I get these parameters passed as strings to MATLAB? Is there a better way to run these experiments in parallel than the method I'm doing?


Solution

  • I hate quoting. man parallel says:

    Conclusion: If this is confusing consider avoiding having to deal with quoting by writing a small script or a function (remember to export -f the function) and have GNU parallel call that.

    So in your case make a function:

    run_holdout() {
      echo This should run_holdout_parallel on $1 $2 $3 $4
      matlab -nodisplay -r "run_holdout_parallel(\"$1\", \"$2\", \"$3\", $4);exit"
    }
    

    When you can run that on the command line:

    $ run_holdout fs_method3 data_name use_vars 1 
    

    and that works, then parallelize with:

    $ export -f run_holdout
    $ ... | parallel run_holdout {1} {2} {3} {4}