Using MPI-IO to write Fortran-formatted files

I am trying to save a solution using the OVERFLOW-PLOT3D q-file format (defined here: http://overflow.larc.nasa.gov/files/2014/06/Appendix_A.pdf). For a single grid, it is basically,

READ(1) NGRID
READ(1) JD,KD,LD,NQ,NQC
READ(1) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF, &
        IGAM,HTINF,HT1,HT2,RGAS1,RGAS2, &
        FSMACH,TVREF,DTVREF
READ(1) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)

All of the variables are double precision numbers, excepts for NGRID, JD, KD, LD, NQ, NQC and IGAM which are integers. I need to use MPI-IO to export the solution. If I take a very simple example with a single processor, the following code does not work, but I do not understand why.

call mpi_file_open( mpi_comm_world, fileOut, mpi_mode_wronly + mpi_mode_create, &
                  mpi_info_null, mpi_fh, ierr )
offset = 0
call mpi_file_seek( mpi_fh, offset, mpi_seek_set, ierr )
call mpi_file_write( mpi_fh, (/NGRID,JD,KD,LD,NQ,NQC/), 6, mpi_integer, mstat, ierr )
call mpi_file_write( mpi_fh, (/REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF/), 7, mpi_double_precision, mstat, ierr )
call mpi_file_write( mpi_fh, IGAM, 1, mpi_integer, mstat, ierr )
call mpi_file_write( mpi_fh, (/HTINF,HT1,HT2,RGAS1,RGAS2,FSMACH,TVREF,DTVREF/), 8, mpi_double_precision, mstat, ierr )

call mpi_file_write( mpi_fh, Q, NQ*JD*KD*LD, mpi_double_precision, mstat, ierr )

Tecplot does not recognize the format. However, if I write a simple non-MPI code such as this one:

open(2, file=fileOut, form='unformatted', convert='little_endian')
write(2) NGRID
write(2) JD, KD, LD, NQ, NQC
write(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF, &
         IGAM,HTINF,HT1,HT2,RGAS1,RGAS2, &
         FSMACH,TVREF,DTVREF
write(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)

everything works just fine. What is wrong with my MPI-IO code?? Thank you very much for your help!

Joachim

NB: I do not know if that is relevant, but if I add a mpi_file_seek(offset) just before the final write statement, with offset=144. Tecplot agrees to load the file (but the data is not read correctly). This is strange, because the normal offset should be 7 integers + 15 real*8 = 148 bytes...

EDIT: Your approach, @Jonathan Dursi, does not seem to work with Tecplot for some reason. Is there anything wrong with the following code? (simplified for a single processor)

 call MPI_File_write(fileh, [4, ngrid, 4], 3, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [20, jd, kd, ld, nq, nqc, 20], 7, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [56], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [refmach,alpha,rey,time,gaminf,beta,tinf], 7, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [56], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [4, IGAM, 4], 3, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [64], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [HTINF,HT1,HT2,RGAS1,RGAS2,FSMACH,TVREF,DTVREF], 8, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [64], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [jd*kd*ld*nq*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, q, jd*kd*ld*nq, MPI_double_precision, MPI_STATUS_IGNORE, ierr)
 call MPI_File_write(fileh, [jd*kd*ld*nq*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)

Solution

@francescalus is right - Fortran sequential unformatted data is record based - which is actually really nice for a lot of things, but nothing else uses it (even MPI-IO in Fortran, which is more C like - the file is just a big long stream of undifferentiated bytes).

Let's take a look at a simplified version of your writing program in the question:

program testwrite

integer, parameter:: ngrid=2
integer, parameter:: jd=4, kd=3, ld=2, nq=1, nqc=-1

integer, parameter :: refmach=1, alpha=2, rey=3, time=4, gaminf=5
integer, parameter :: beta=6, tinf=7

integer, dimension(jd,kd,ld,nq) :: q
q = 0

open(2, file='ftest.dat', form='unformatted', convert='little_endian')
write(2) NGRID
write(2) JD, KD, LD, NQ, NQC
write(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF
write(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)
close(2)

end program testwrite

Running this, and taking a look at the resulting binary file with od (I've made everything an integer for clarity in looking at the binary file):

$ gfortran -o fwrite fwrite.f90 
$ ./fwrite 
$ od --format "d" ftest.dat 
0000000           4           2           4          20
0000020           4           3           2           1
0000040          -1          20          28           1
0000060           2           3           4           5
0000100           6           7          28          96
0000120           0           0           0           0
*
0000260          96
0000264

We see, for instance, the ngrid (2) integer there at the start, bookended by 4/4 - the size of the record in bytes. Then, bookended by 20/20, we see the 5 integers (5*4 bytes) 4,3,2,1,-1 -- jd, kd, ld, nq, nqc. Towards the end, we see a bunch of zeros bookended by 96 (= 4bytes/integer *4*3*2*1) which represents q. (Note that there's no standard that defines this behaviour, but I'm unaware of any major Fortran compiler that doesn't do it this way; however, when records get larger than can be described by a 4-byte integer, behaviour starts to differ.

We can use the following simple program to test the data file:

program testread

implicit none

integer :: ngrid
integer :: jd, kd, ld, nq, nqc

integer :: refmach, alpha, rey, time, gaminf
integer :: beta, tinf

integer :: j, k, l, n

integer, allocatable, dimension(:,:,:,:) :: q
character(len=64) :: filename

if (command_argument_count() < 1) then
    print *,'Usage: read [filename]'
else 
    call get_command_argument(1, filename)
    open(2, file=trim(filename), form='unformatted', convert='little_endian')
    read(2) NGRID
    read(2) JD, KD, LD, NQ, NQC
    read(2) REFMACH,ALPHA,REY,TIME,GAMINF,BETA,TINF

    allocate(q(jd, kd, ld, nq))
    read(2) ((((Q(J,K,L,N),J=1,JD),K=1,KD),L=1,LD),N=1,NQ)
    close(2)

    print *, 'Ngrid = ', ngrid
    print *, 'jd, kd, ld, nq, nqc = ', jd, kd, ld, nq, nqc

    print *, 'q: min/mean/max = ', minval(q), sum(q)/size(q), maxval(q)
    deallocate(q)
endif

end program testread

and running gives

$ ./fread ftest.dat 
 Ngrid =            2
 jd, kd, ld, nq, nqc =            4           3           2           1          -1
 q: min/mean/max =            0           0           0

Simple enough.

So this behaviour is pretty easy to mimic in MPI-IO. There's really three parts here - the header, Q, which I assume to be distributed (with, say, MPI subarrays), and the footer (which is just the bookend to the array).

So let's take a look at an MPI-IO program in Fortran that would do the same thing:

program mpiwrite

  use mpi
  implicit none

  integer, parameter:: ngrid=2
  integer, parameter:: jd=3, kd=3, ld=3, nlocq=3, nqc=-1
  integer :: nq

  integer, parameter :: refmach=1, alpha=2, rey=3, time=4, gaminf=5
  integer, parameter :: beta=6, tinf=7

  integer, dimension(jd,kd,ld,nlocq) :: q

  integer :: intsize
  integer :: subarray

  integer :: fileh
  integer(kind=MPI_Offset_kind) :: offset

  integer :: comsize, rank, ierr

  call MPI_Init(ierr)
  call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  nq = nlocq * comsize
  q = rank

  ! create a subarray; each processor gets its own q-slice of the
  ! global array
  call MPI_Type_create_subarray (4, [jd, kd, ld, nq], [jd, kd, ld, nlocq], &
                                    [0, 0, 0, nlocq*rank], &
                                    MPI_ORDER_FORTRAN, MPI_INTEGER, subarray,  ierr)
  call MPI_Type_commit(subarray, ierr)

  call MPI_File_open(MPI_COMM_WORLD, 'mpi.dat',         &
                     MPI_MODE_WRONLY + MPI_MODE_CREATE, &
                     MPI_INFO_NULL, fileh, ierr )

  ! the header size is:
  !  1 field of 1 integer ( = 4*(1 + 1 + 1) = 12 bytes )
  ! +1 field of 5 integers( = 4*(1 + 5 + 1) = 28 bytes )
  ! +1 field of 7 integers( = 4*(1 + 7 + 1) = 36 bytes )
  ! +first bookend of array size = 4 bytes
  offset = 12 + 28 + 36 + 4

  ! rank 1 writes the header and footer
  if (rank == 0) then
      call MPI_File_write(fileh, [4, ngrid, 4], 3, MPI_INTEGER, &
                          MPI_STATUS_IGNORE, ierr)
      call MPI_File_write(fileh, [20, jd, kd, ld, nq, nqc, 20], 7, MPI_INTEGER, &
                          MPI_STATUS_IGNORE, ierr)
      call MPI_File_write(fileh, &
                        [28, refmach, alpha, rey, time, gaminf, beta, tinf, 28],&
                         9, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)

      call MPI_File_write(fileh, [jd*kd*ld*nq*4], 1, MPI_INTEGER,  &
                         MPI_STATUS_IGNORE, ierr)
      call MPI_File_seek(fileh, offset+jd*kd*ld*nq*4, MPI_SEEK_CUR, ierr)
      call MPI_File_write(fileh, [jd*kd*ld*nq*4], 1, MPI_INTEGER,  &
                         MPI_STATUS_IGNORE, ierr)
  endif

  ! now everyone dumps their part of the array
  call MPI_File_set_view(fileh, offset, MPI_INTEGER, subarray,   &
                                'native', MPI_INFO_NULL, ierr)
  call MPI_File_write_all(fileh, q, jd*kd*ld*nlocq, MPI_INTEGER, &
                                MPI_STATUS_IGNORE, ierr)

  call MPI_File_close(fileh, ierr)

  CALL MPI_Finalize(ierr)

end program mpiwrite

In this program, process 0 is responsible for writing the header and the record fields. It starts off by writing the three header records, each bookended by the record lengths in bytes; and then it writes the two bookends for the big Q array.

Then, each rank sets the file view to first skip the header and then describe just its piece of the global array (filled here just with its rank number), and writes out its local data. These will all be non-overlapping pieces of data.

So let's try this with a couple of different sizes:

$ mpif90 -o mpifwrite mpifwrite.f90
$ mpirun -np 1 ./mpifwrite

$ ./fread mpi.dat
 Ngrid =            2
 jd, kd, ld, nq, nqc =            3           3           3           3          -1
 q: min/mean/max =            0           0           0

$ od --format="d" mpi.dat
0000000           4           2           4          20
0000020           3           3           3           3
0000040          -1          20          28           1
0000060           2           3           4           5
0000100           6           7          28         324
0000120           0           0           0           0
*
0000740           0         324
0000750

$ mpirun -np 3 ./mpifwrite
$ ./fread mpi.dat
 Ngrid =            2
 jd, kd, ld, nq, nqc =            3           3           3           9          -1
 q: min/mean/max =            0           1           2

$ od --format="d" mpi.dat
0000000           4           2           4          20
0000020           3           3           3           9
0000040          -1          20          28           1
0000060           2           3           4           5
0000100           6           7          28         972
0000120           0           0           0           0
*
0000620           0           1           1           1
0000640           1           1           1           1
*
0001320           1           1           2           2
0001340           2           2           2           2
*
0002020           2           2           2           0
0002040           0           0           0           0
*
0002140           0           0           0         972
0002160

which is the output we expect. Extending things to multiple datatypes or multiple grids is relatively straightforward.