Search code examples
iofortranmpimpi-io

Using MPI-IO to write Fortran-formatted files


I am trying to save a solution using the OVERFLOW-PLOT3D q-file format (defined here: http://overflow.larc.nasa.gov/files/2014/06/Appendix_A.pdf). For a single grid, it is basically,

READ(1) NGRID
READ(1) JD,KD,LD,NQ,NQC
READ(1) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF, &
        IGAM,HTINF,HT1,HT2,RGAS1,RGAS2, &
        FSMACH,TVREF,DTVREF
READ(1) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)    

All of the variables are double precision numbers, excepts for NGRID, JD, KD, LD, NQ, NQC and IGAM which are integers. I need to use MPI-IO to export the solution. If I take a very simple example with a single processor, the following code does not work, but I do not understand why.

call mpi_file_open( mpi_comm_world, fileOut, mpi_mode_wronly + mpi_mode_create, &
                  mpi_info_null, mpi_fh, ierr )
offset = 0
call mpi_file_seek( mpi_fh, offset, mpi_seek_set, ierr )
call mpi_file_write( mpi_fh, (/NGRID,JD,KD,LD,NQ,NQC/), 6, mpi_integer, mstat, ierr )
call mpi_file_write( mpi_fh, (/REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF/), 7, mpi_double_precision, mstat, ierr )
call mpi_file_write( mpi_fh, IGAM, 1, mpi_integer, mstat, ierr )
call mpi_file_write( mpi_fh, (/HTINF,HT1,HT2,RGAS1,RGAS2,FSMACH,TVREF,DTVREF/), 8, mpi_double_precision, mstat, ierr )

call mpi_file_write( mpi_fh, Q, NQ*JD*KD*LD, mpi_double_precision, mstat, ierr )

Tecplot does not recognize the format. However, if I write a simple non-MPI code such as this one:

open(2, file=fileOut, form='unformatted', convert='little_endian')
write(2) NGRID
write(2) JD, KD, LD, NQ, NQC
write(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF, &
         IGAM,HTINF,HT1,HT2,RGAS1,RGAS2, &
         FSMACH,TVREF,DTVREF
write(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)

everything works just fine. What is wrong with my MPI-IO code?? Thank you very much for your help!

Joachim

NB: I do not know if that is relevant, but if I add a mpi_file_seek(offset) just before the final write statement, with offset=144. Tecplot agrees to load the file (but the data is not read correctly). This is strange, because the normal offset should be 7 integers + 15 real*8 = 148 bytes...

EDIT: Your approach, @Jonathan Dursi, does not seem to work with Tecplot for some reason. Is there anything wrong with the following code? (simplified for a single processor)

 call MPI_File_write(fileh, [4, ngrid, 4], 3, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [20, jd, kd, ld, nq, nqc, 20], 7, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [56], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [refmach,alpha,rey,time,gaminf,beta,tinf], 7, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [56], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [4, IGAM, 4], 3, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [64], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [HTINF,HT1,HT2,RGAS1,RGAS2,FSMACH,TVREF,DTVREF], 8, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [64], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [jd*kd*ld*nq*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, q, jd*kd*ld*nq, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [jd*kd*ld*nq*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)

Solution

  • @francescalus is right - Fortran sequential unformatted data is record based - which is actually really nice for a lot of things, but nothing else uses it (even MPI-IO in Fortran, which is more C like - the file is just a big long stream of undifferentiated bytes).

    Let's take a look at a simplified version of your writing program in the question:

    program testwrite
    
    integer, parameter:: ngrid=2
    integer, parameter:: jd=4, kd=3, ld=2, nq=1, nqc=-1
    
    integer, parameter :: refmach=1, alpha=2, rey=3, time=4, gaminf=5
    integer, parameter :: beta=6, tinf=7
    
    integer, dimension(jd,kd,ld,nq) :: q
    q = 0
    
    open(2, file='ftest.dat', form='unformatted', convert='little_endian')
    write(2) NGRID
    write(2) JD, KD, LD, NQ, NQC
    write(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF
    write(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)
    close(2)
    
    end program testwrite
    

    Running this, and taking a look at the resulting binary file with od (I've made everything an integer for clarity in looking at the binary file):

    $ gfortran -o fwrite fwrite.f90 
    $ ./fwrite 
    $ od --format "d" ftest.dat 
    0000000           4           2           4          20
    0000020           4           3           2           1
    0000040          -1          20          28           1
    0000060           2           3           4           5
    0000100           6           7          28          96
    0000120           0           0           0           0
    *
    0000260          96
    0000264
    

    We see, for instance, the ngrid (2) integer there at the start, bookended by 4/4 - the size of the record in bytes. Then, bookended by 20/20, we see the 5 integers (5*4 bytes) 4,3,2,1,-1 -- jd, kd, ld, nq, nqc. Towards the end, we see a bunch of zeros bookended by 96 (= 4bytes/integer *4*3*2*1) which represents q. (Note that there's no standard that defines this behaviour, but I'm unaware of any major Fortran compiler that doesn't do it this way; however, when records get larger than can be described by a 4-byte integer, behaviour starts to differ.

    We can use the following simple program to test the data file:

    program testread
    
    implicit none
    
    integer :: ngrid
    integer :: jd, kd, ld, nq, nqc
    
    integer :: refmach, alpha, rey, time, gaminf
    integer :: beta, tinf
    
    integer :: j, k, l, n
    
    integer, allocatable, dimension(:,:,:,:) :: q
    character(len=64) :: filename
    
    if (command_argument_count() < 1) then
        print *,'Usage: read [filename]'
    else 
        call get_command_argument(1, filename)
        open(2, file=trim(filename), form='unformatted', convert='little_endian')
        read(2) NGRID
        read(2) JD, KD, LD, NQ, NQC
        read(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF
    
        allocate(q(jd, kd, ld, nq))
        read(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)
        close(2)
    
        print *, 'Ngrid = ', ngrid
        print *, 'jd, kd, ld, nq, nqc = ', jd, kd, ld, nq, nqc
    
        print *, 'q: min/mean/max = ', minval(q), sum(q)/size(q), maxval(q)
        deallocate(q)
    endif
    
    end program testread
    

    and running gives

    $ ./fread ftest.dat 
     Ngrid =            2
     jd, kd, ld, nq, nqc =            4           3           2           1          -1
     q: min/mean/max =            0           0           0
    

    Simple enough.

    So this behaviour is pretty easy to mimic in MPI-IO. There's really three parts here - the header, Q, which I assume to be distributed (with, say, MPI subarrays), and the footer (which is just the bookend to the array).

    So let's take a look at an MPI-IO program in Fortran that would do the same thing:

    program mpiwrite
    
      use mpi
      implicit none
    
      integer, parameter:: ngrid=2
      integer, parameter:: jd=3, kd=3, ld=3, nlocq=3, nqc=-1
      integer :: nq
    
      integer, parameter :: refmach=1, alpha=2, rey=3, time=4, gaminf=5
      integer, parameter :: beta=6, tinf=7
    
      integer, dimension(jd,kd,ld,nlocq) :: q
    
      integer :: intsize
      integer :: subarray
    
      integer :: fileh
      integer(kind=MPI_Offset_kind) :: offset
    
      integer :: comsize, rank, ierr
    
      call MPI_Init(ierr)
      call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
      call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
    
      nq = nlocq * comsize
      q = rank
    
      ! create a subarray; each processor gets its own q-slice of the
      ! global array
      call MPI_Type_create_subarray (4, [jd, kd, ld, nq], [jd, kd, ld, nlocq], &
                                        [0, 0, 0, nlocq*rank], &
                                        MPI_ORDER_FORTRAN, MPI_INTEGER, subarray,  ierr)
      call MPI_Type_commit(subarray, ierr)
    
      call MPI_File_open(MPI_COMM_WORLD, 'mpi.dat',         &
                         MPI_MODE_WRONLY + MPI_MODE_CREATE, &
                         MPI_INFO_NULL, fileh, ierr )
    
      ! the header size is:
      !  1 field of 1 integer ( = 4*(1 + 1 + 1) = 12 bytes )
      ! +1 field of 5 integers( = 4*(1 + 5 + 1) = 28 bytes )
      ! +1 field of 7 integers( = 4*(1 + 7 + 1) = 36 bytes )
      ! +first bookend of array size = 4 bytes
      offset = 12 + 28 + 36 + 4
    
      ! rank 1 writes the header and footer
      if (rank == 0) then
          call MPI_File_write(fileh, [4, ngrid, 4], 3, MPI_INTEGER, &
                              MPI_STATUS_IGNORE, ierr)
          call MPI_File_write(fileh, [20, jd, kd, ld, nq, nqc, 20], 7, MPI_INTEGER, &
                              MPI_STATUS_IGNORE, ierr)
          call MPI_File_write(fileh, &
                            [28, refmach, alpha, rey, time, gaminf, beta, tinf, 28],&
                             9, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
    
          call MPI_File_write(fileh, [jd*kd*ld*nq*4], 1, MPI_INTEGER,  &
                             MPI_STATUS_IGNORE, ierr)
          call MPI_File_seek(fileh, offset+jd*kd*ld*nq*4, MPI_SEEK_CUR, ierr)
          call MPI_File_write(fileh, [jd*kd*ld*nq*4], 1, MPI_INTEGER,  &
                             MPI_STATUS_IGNORE, ierr)
      endif
    
      ! now everyone dumps their part of the array
      call MPI_File_set_view(fileh, offset, MPI_INTEGER, subarray,   &
                                    'native', MPI_INFO_NULL, ierr)
      call MPI_File_write_all(fileh, q, jd*kd*ld*nlocq, MPI_INTEGER, &
                                    MPI_STATUS_IGNORE, ierr)
    
      call MPI_File_close(fileh, ierr)
    
      CALL MPI_Finalize(ierr)
    
    end program mpiwrite
    

    In this program, process 0 is responsible for writing the header and the record fields. It starts off by writing the three header records, each bookended by the record lengths in bytes; and then it writes the two bookends for the big Q array.

    Then, each rank sets the file view to first skip the header and then describe just its piece of the global array (filled here just with its rank number), and writes out its local data. These will all be non-overlapping pieces of data.

    So let's try this with a couple of different sizes:

    $ mpif90 -o mpifwrite mpifwrite.f90
    $ mpirun -np 1 ./mpifwrite
    
    $ ./fread mpi.dat
     Ngrid =            2
     jd, kd, ld, nq, nqc =            3           3           3           3          -1
     q: min/mean/max =            0           0           0
    
    $ od --format="d" mpi.dat
    0000000           4           2           4          20
    0000020           3           3           3           3
    0000040          -1          20          28           1
    0000060           2           3           4           5
    0000100           6           7          28         324
    0000120           0           0           0           0
    *
    0000740           0         324
    0000750
    
    $ mpirun -np 3 ./mpifwrite
    $ ./fread mpi.dat
     Ngrid =            2
     jd, kd, ld, nq, nqc =            3           3           3           9          -1
     q: min/mean/max =            0           1           2
    
    $ od --format="d" mpi.dat
    0000000           4           2           4          20
    0000020           3           3           3           9
    0000040          -1          20          28           1
    0000060           2           3           4           5
    0000100           6           7          28         972
    0000120           0           0           0           0
    *
    0000620           0           1           1           1
    0000640           1           1           1           1
    *
    0001320           1           1           2           2
    0001340           2           2           2           2
    *
    0002020           2           2           2           0
    0002040           0           0           0           0
    *
    0002140           0           0           0         972
    0002160
    

    which is the output we expect. Extending things to multiple datatypes or multiple grids is relatively straightforward.