Search code examples
matlab

If createJob/createTask works for my function? What is the difference between create multiple jobs and create multiple tasks in one job?


I want to run multiple completely independent scripts, which only differs from each other by 1 or 2 parameters, in parallel, so I write the main part as a function and pass the parameters by createJob and createTask as follow:

% Run_DMRG_HubbardKondo
UList = [1, 2, 4, 8];
J_UList = [-1, 0:0.2:2];
c = parcluster;
c.NumThreads = 3;
j = createJob(c);
for iU = 1:numel(UList)
    for iJ_U = 1:numel(J_UList)
        t = createTask(j, @DMRG_HubbardKondo, 0, {{UList(iU), J_UList(iJ_U)}});
    end
end
submit(j);
wait(j,'finished')
delete(j);
clear j t
exit
function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
end

What if I createJob multiple times each with one createTask? I know there are some options like attachedfile in createJob. But with respect to independency, is there any difference between createJob and createTask? The reason I ask about independency is that there are setenv inside the DMRG_HubbardKondo function as follow:

function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
DirTmp = '/tmp/swan';
setenv('LMA', DirTmp)
Para.DateStr = datestr(datetime('now'),30);
% RCDir named by parameter and datetime
Para.RCDir = [DirTmp,'/RCStore',Para.DateStr,sprintf('U%.4gJ%.4g', [U_Job,J_U_Job])];
k = [strfind(Para.Symm,'SU2'), strfind(Para.Symm,'-v')];
if ~isempty(k)
    RC = Para.RCDir
    if exist(RC, 'dir')==0
        mkdir(RC);    % creat if not exist
        fprintf([RC,' made.\n'])
    end
    setenv('RC_STORE', RC);
    setenv('CG_VERBOSE', '0');
end
... % (skipped)
end

The main part DMRG_HubbardKondo will use some mex-compiled functions which act like wigner-eckart theorem. Specifically, it will generate and retrieve data(cg coefficients) in RCDir in every steps. I guess those mex-compiled functions will find the corresponding RCDir by "getenv" and I want to know whether createJob/createTask will work correctly.

In summary, my questions are:

  1. difference between create multiple tasks in one job and create multiple jobs each with one task.
  2. will createJob/createTask work for my function?

I know sbatch will work by writing a script passing parameters to submit.sh as follow:

function GenSubmitsh(partition,nodeNo,TLim,NCore,mem,logName,JobName,ParaName,ScriptName)

if isnan(nodeNo)
    nodeStr = '##SBATCH --nodelist=auto \n';
else
    nodeStr = sprintf('#SBATCH --nodelist=node%g \n',nodeNo);
end

Submitsh = sprintf([
    '#!/bin/bash -l \n',...
    '#SBATCH --partition=%s \n',...
    nodeStr,...
    '#SBATCH --exclude=node1051 \n',...
    '#SBATCH --time=%s \n',...
    '#SBATCH --nodes=1 \n',...
    '#SBATCH --ntasks=1 \n',...
    '#SBATCH --cpus-per-task=%g \n',...
    '#SBATCH --mem=%s \n',...
    '#SBATCH --output=%s \n',...
    '#SBATCH --job-name=%s \n',...
    '\n',...
    '##Do not remove or change this line in GU_CLUSTER \n',...
    '##export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK     \n',...
    '\n',...
    'echo "Job Started At" \n',...
    'date \n',...
    '\n',...
    'matlab -nodesktop -nojvm -nodisplay -r "ParaName=''%s'',%s" \n',...
    '\n',...
    'echo "Job finished at" \n',...
    'date \n'],...
    partition,TLim,NCore,mem,logName,JobName,ParaName,ScriptName);

fileID = fopen('Submit.sh','w');
fprintf(fileID,'%s',Submitsh);
fclose(fileID);

end

I hope createJob/createTask will work equivalently.(i.e. completely independent)


Solution

  • There are only minor differences between multiple createJob calls each with a single createTask vs. single createJob with multiple createTask calls. I would say it is generally better to use a single Job with multiple Tasks, unless you have a specific reason not to. Here are some considerations:

    • Having a single Job object allows some of the stages of the submission process to be done once instead of multiple times (e.g. some pieces of attaching files etc.)
    • It is possible (although admittedly awkward) to vectorise the calls to createTask. (This doesn't affect execution)
    • On the MATLAB Job Scheduler (MJS) system, you can set more properties per Job object, such as a range of workers to be used during execution
    • When using schedulers such as SLURM, multiple Tasks of a single Job can be submitted to the scheduler as a "job array", which I believe can be more efficient for the scheduler itself.
    • When using schedulers other than MJS, each Task runs in a fresh MATLAB worker process, regardless of whether it is the only Task in a Job or not.