Search code examples
pythonmpih5pympi4pyhdf

Why do I have to call MPI.Finalize() inside the destructor?


I'm currently trying to understand mpi4py. I set mpi4py.rc.initialize = False and mpi4py.rc.finalize = False because I can't see why we would want initialization and finalization automatically. The default behavior is that MPI.Init() gets called when MPI is being imported. I think the reason for that is because for each rank a instance of the python interpreter is being run and each of those instances will run the whole script but that's just guessing. In the end, I like to have it explicit.

Now this introduced some problems. I have this code

import numpy as np
import mpi4py
mpi4py.rc.initialize = False  # do not initialize MPI automatically
mpi4py.rc.finalize = False # do not finalize MPI automatically

from mpi4py import MPI # import the 'MPI' module
import h5py

class DataGenerator:
    def __init__(self, filename, N, comm):
        self.comm = comm
        self.file = h5py.File(filename, 'w', driver="mpio", comm=comm)

        # Create datasets
        self.data_ds= self.file.create_dataset("indices", (N,1), dtype='i')

    def __del__(self):
        self.file.close()
        

if __name__=='__main__':
    MPI.Init()
    world = MPI.COMM_WORLD
    world_rank = MPI.COMM_WORLD.rank

    filename = "test.hdf5"
    N = 10
    data_gen = DataGenerator(filename, N, comm=world)

    MPI.Finalize()

which results in

$ mpiexec -n 4 python test.py 
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort. [eu-login-04:01559] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort. [eu-login-04:01560] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------------------------- Primary job  terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort. [eu-login-04:01557] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

  Process name: [[15050,1],3]   Exit code:    1
--------------------------------------------------------------------------

I am a bit confused as to what's going on here. If I move the MPI.Finalize() to the end of the destructor, it works fine.

Not that I also use h5py which uses MPI for parallelization. So I have a parallel file IO here. Not that h5py needs to be compiled with MPI support. You can easily do that by setting up a virtual environment and running pip install --no-binary=h5py h5py.


Solution

  • The way you wrote it, data_gen lives until the main function returns. But you call MPI.Finalize within the function. Therefore the destructor runs after finalize. The h5py.File.close method seems to call MPI.Comm.Barrier internally. Calling this after finalize is forbidden.

    If you want have explicit control, make sure all objects are destroyed before calling MPI.Finalize. Of course even that may not be enough in case some objects are only destroyed by the garbage collector, not the reference counter.

    To avoid this, use context managers instead of destructors.

    class DataGenerator:
        def __init__(self, filename, N, comm):
            self.comm = comm
            self.file = h5py.File(filename, 'w', driver="mpio", comm=comm)
    
            # Create datasets
            self.data_ds= self.file.create_dataset("indices", (N,1), dtype='i')
    
        def __enter__(self):
            return self
    
        def __exit__(self, type, value, traceback):
            self.file.close()
    
    
    if __name__=='__main__':
        MPI.Init()
        world = MPI.COMM_WORLD
        world_rank = MPI.COMM_WORLD.rank
    
        filename = "test.hdf5"
        N = 10
        with DataGenerator(filename, N, comm=world) as data_gen:
            pass
        MPI.Finalize()