How to use multiple nodes/cores on a cluster with parellelized Python code

I have a piece of Python code where I use joblib and multiprocessing to make parts of the code run in parallel. I have no trouble running this on my desktop where I can use Task Manager to see that it uses all four cores and runs the code in parallel.

I recently learnt that I have access to a HPC cluster with 100+ 20 core nodes. The cluster uses SLURM as the workload manager.

The first question is: Is it possible to run parallelized Python code on a cluster?

If it is possible,

Does the Python code I have need to be changed at all to run on the cluster, and
What #SBATCH instructions need to be put in the job submission file to tell it that the parallelized parts of the code should run on four cores (or is it four nodes)?

The cluster I have access to has the following attributes:

PARTITION      CPUS(A/I/O/T)       NODES(A/I)  TIMELIMIT      MEMORY  CPUS  SOCKETS CORES 
standard       324/556/16/896      34/60       5-00:20:00     46000+  8+    2       4+

Solution

Typically MPI is considered the de-facto standard for High-Performance Computing. There are a few MPI bindings for Python:

MPI for Python
pyMPI
Boost.MPI has Python bindings.

There are also a bunch of frameworks for that - list

Your code will require at least minimal changes, but they shouldn't be too much.

When you port to MPI you can run a single process per core and you will not need to use multiprocessing

So, for example, if you have 100 nodes with 24 cores each, you will run 2400 Python processes.