Search code examples
fortranmpiderived-types

Derived datatype as basepointer for MPI window


I want to use a derived datatype with MPI3 shared memory. Given the following derived datatype:

type ::pve_data
  type(pve_data), pointer :: next => NULL()
  real*8        , pointer :: array(:) => NULL()
end type pve_data

Initialized with:

type(pve_data) :: pve_grid
allocate(pve_grid%array(10))
pve_grid%array = 0.0d0

After having done some calculation with pve_grid I want to create a shared memory window with MPI_WIN_ALLOCATE_SHARED, whose base should be pve_grid. (Maybe MPI_WIN_CREATE_DYNAMIC would be an alternative, but I need the performance through shared memory)

1) Till now I just used primitive data types or arrays of those as basepointer for the window creation. Can a derived datatype also be used as basepointer? Or do I need to create a window for every component of the derived datatype, which is a primitive variable?

2) Is it possible to use an already "used" variable (pve_grid in this case) as basepointer? Or do I need to use a new pve_data as basepointer and copy the values from pve_grid to it?

EDIT I know, that it would be easier to use an OpenMP-approach instead of MPI Shared Memory. But I want to try a MPI-only approach on purpose, to improve my MPI-abilities.

EDIT2 (05.09.16) I made some progress and was able to use shared Memory, where the basepointer was a simple integer variable. But I still have a problem when I want to use a derived datatype as basepointer for the window-creation (For testing purpose I changed its definition - see sharedStructMethod.f90 below). The compiler and execution does not throw any error...the remote access simply has no effect on derived datatype components: The write shows the old values, which has been initialized by the parent. The subsequent codes show my current state. I used the possibility to spawn new processes during execution time: the parent-process creates the window and the child-procs execute changes on it. I hope spawning processes does not trouble efforts of debugging, I just added it for my project. (And for the next time I will change real*8 to fit the standard).

Declaration of the derived datatype (sharedStructMethod.f90)

  module sharedStructMethod

      REAL*8, PARAMETER      :: prec=1d-13
      INTEGER, PARAMETER     :: masterProc = 0
      INTEGER, PARAMETER     :: SLAVE_COUNT = 2
      INTEGER, PARAMETER     :: CONSTSIZE = 10
      !Struct-Definition    
         type :: vertex
            INTEGER, Dimension(3) :: coords
         end type vertex

         type :: pve_data
            real(kind(prec)), pointer   :: intensity(:) => NULL()
            logical, pointer            :: flag => NULL()
            type(vertex), pointer       :: vertices(:) => NULL()
         end type pve_data
      end module sharedStructMethod

Declaration of the parent-process (sharedStruct.f90), which the user executes.

      PROGRAM sharedStruct
     USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
     USE mpi
     USE sharedStructMethod
     IMPLICIT NONE
     type(pve_data) :: pve_grid
     integer        :: ierror
     integer        :: myRank, numProcs
     INTEGER        :: childComm
     INTEGER        :: childIntracomm
     integer        :: i
     INTEGER(KIND=MPI_ADDRESS_KIND) :: memSize
     INTEGER                        :: dispUnit
     TYPE(C_PTR)                    :: basePtr
     INTEGER                        :: win
     TYPE(pve_data), POINTER        :: shared_data

     call MPI_INIT(ierror)
     memSize = sizeof(pve_grid)
     dispUnit = 1
     CALL MPI_COMM_SPAWN("sharedStructWorker.x", MPI_ARGV_NULL, SLAVE_COUNT, MPI_INFO_NULL, masterProc,MPI_COMM_SELF, childComm,MPI_ERRCODES_IGNORE, ierror);
     CALL MPI_INTERCOMM_MERGE(childComm, .false., childIntracomm, ierror)
     CALL MPI_WIN_ALLOCATE_SHARED(memSize, dispUnit, MPI_INFO_NULL, childIntracomm, basePtr, win, ierror)
     CALL C_F_POINTER(basePtr, shared_data)
     CALL MPI_WIN_LOCK(MPI_LOCK_EXCLUSIVE, masterProc,0,win,ierror)

     allocate(shared_data%intensity(CONSTSIZE))
     allocate(shared_data%vertices(CONSTSIZE))
     allocate(shared_data%flag)
     shared_data%intensity = -1.0d0
     DO i =1,CONSTSIZE
        shared_data%vertices(i)%coords(1) = -1
        shared_data%vertices(i)%coords(2) = -2
        shared_data%vertices(i)%coords(3) = -3
     END DO
     shared_data%flag = .true.

     CALL MPI_WIN_UNLOCK(masterProc, win, ierror)
     CALL MPI_BARRIER(childIntracomm, ierror)
     CALL MPI_BARRIER(childIntracomm, ierror)
     WRITE(*,*) "After: Flag ",shared_data%flag,"intensity(1): ",shared_data%intensity(1)
     call mpi_finalize(ierror)
  END PROGRAM sharedStruct

And last but not least: The Declaration of the child-process, which is spawned automatically by the parent-proc during runtime, and does change the window content (sharedStructWorker.f90)

PROGRAM sharedStructWorker
USE mpi
  USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
  USE sharedStructMethod
  IMPLICIT NONE

  INTEGER                   :: ierror
  INTEGER                   :: myRank, numProcs
  INTEGER                   :: parentComm
  INTEGER                   :: parentIntracomm
  TYPE(C_PTR)               :: pveCPtr
  TYPE(pve_data), POINTER   :: pve_gridPtr
  INTEGER                   :: win
  INTEGER(KIND=MPI_ADDRESS_KIND)    :: sizeOfPve
  INTEGER                   :: dispUnit2

  CALL MPI_INIT(ierror)
  CALL MPI_COMM_GET_PARENT(parentComm, ierror)
  CALL MPI_INTERCOMM_MERGE(parentComm, .true., parentIntracomm, ierror)
  sizeOfPve = 0_MPI_ADDRESS_KIND
  dispUnit2 = 1
  CALL MPI_WIN_ALLOCATE_SHARED(sizeOfPve,dispUnit2, MPI_INFO_NULL, parentIntracomm, pveCPtr, win, ierror)
  CALL MPI_WIN_SHARED_QUERY(win, masterProc, sizeOfPve, dispUnit2, pveCPtr, ierror)
  CALL C_F_POINTER(pveCPtr, pve_gridPtr)

  CALL MPI_BARRIER(parentIntracomm, ierror)
  CALL MPI_WIN_LOCK(MPI_LOCK_EXCLUSIVE,masterProc,0,win,ierror)
  pve_gridPtr%flag = .false.
  pve_gridPtr%intensity(1) = 42
  CALL MPI_WIN_UNLOCK(masterProc, win, ierror)
  CALL MPI_BARRIER(parentIntracomm, ierror)
  CALL MPI_FINALIZE(ierror)
  END PROGRAM sharedStructWorker

Compilation with:

mpiifort   -c  sharedStructMethod.f90
mpiifort   -o  sharedStructWorker.x sharedStructWorker.f90 sharedStructMethod.o
mpiifort   -o  sharedStruct.x sharedStruct.f90 sharedStructMethod.o

Is this the right approach or do I need to create a shared memory block with its own window for every component of the derived datatype pve_data, which is just a pointer? Thank you for your help!

EDIT 10/09/2016, Solution: Explanation in comments. One way to solve the problem is to generate a window for every component, on which parent and children work. For more complexe derived datatypes this quickly becomes tedious to implement, but it seems that there is no different choice.


Solution

  • I think this is mostly covered here: MPI Fortran code: how to share data on node via openMP?

    So, in terms of 1) you can have basepointers that are Fortran derived types. However, the answer to 2) is that MPI_Win_alloc_shared returns storage to you - you cannot reuse existing storage. Given you have a linked list I don't see how this could be converted to a shared window even in principle. To be able to make use of the returned storage it would be much simpler to have an array of pve_data objects - you're going to have to store them consecutively in the returned array so linking them doesn't seem to add anything useful.

    I might have misunderstood here - if you only want the head of the list to be remotely accessible in the window then that should be OK.