This is a long shot but perhaps someone can help. I'm running a model (SWAN) on Windows 10. I'm using the MPI version using MPICH2 (1.4.1p1).
I have two NUMA nodes with 36 cores each. For some reason I can't run the model on all 72 cores.
I'm running the model using mpiexec -n <np> swan.exe
or swanrun inputfile <np>
. If I specify mpiexec -n 72
the model starts 72 processes but only uses the 36 cores of one node. Even if I run 2 or more models at the same time they run on the same node leaving 36 cores pretty much idle.
I'm assuming I made a mistake when installing MPICH2 but can't quite figure out where I went wrong yet. I simply installed MPICH2 using the binary provided here (http://www.mpich.org/static/downloads/1.4.1p1/) Is there some option I overlooked where I have to install it for both nodes separately?
After some digging I realised that I had multiple versions of MPI installed on my machine. While I'm still not sure as to why my model would only run on one of the NUMA nodes at a time (I'm not sure which MPI version mpiexec
was calling) I uninstalled all MPI versions and did a clean reinstall. I can now run on all 72 cores.