Am I misusing mpi_file_write_all?

I have a snippet of code writing data in a file with MPI-IO. It works well when I am using mpi_file_write, but if I switch to the collective mpi_file_write_all I get the wrong result. Do I have to change more than just the call to the write function to use the collective writing routine?

With mpi_file_write the file contains the expected result, 4 lines of the form "1 2 3 4".

$od -f TEST  
0000000               1               2               3               4  
*  
0000100

But with mpi_file_write_all the result file is different: the data are in the wrong order:

$od -f TEST  
0000000               1               1               2               3  
0000020               2               4               3               4  
0000040               1               2               1               3  
0000060               2               3               4               4  
0000100

So I am wondering if I have done something wrong. Is there some difference between mpi_file_write and mpi_file_write_all that I have missed?

I am using the version 3.0 of OpenMPI.

      PROGRAM INDEXED
      USE MPI
      IMPLICIT NONE
      REAL :: A(4)
      INTEGER :: INDEXTYPE,FH,IERR,L,N
      INTEGER(KIND=MPI_OFFSET_KIND) :: OFFSET
      CHARACTER(LEN=MPI_MAX_LIBRARY_VERSION_STRING) :: VERSION

      N=4
      A(1)=1.0
      A(2)=2.0
      A(3)=3.0
      A(4)=4.0

      CALL MPI_INIT(IERR)
      CALL MPI_GET_LIBRARY_VERSION(VERSION,L,IERR)
      WRITE(*,*)TRIM(VERSION)
      CALL CREATE_TYPE(INDEXTYPE,N)

      CALL MPI_FILE_OPEN(MPI_COMM_WORLD, "TEST",
     &  MPI_MODE_RDWR+MPI_MODE_CREATE, MPI_INFO_NULL,FH,IERR)
      CALL MPI_CHECK_CALL(IERR)

      OFFSET=0
      CALL MPI_FILE_SET_VIEW(FH, OFFSET,MPI_REAL,
     &                       INDEXTYPE,'NATIVE',
     &                       MPI_INFO_NULL, IERR)
      CALL MPI_CHECK_CALL(IERR)

      CALL MPI_FILE_WRITE(FH,A,N,MPI_REAL,
     &                    MPI_STATUS_IGNORE,IERR)
      !CALL MPI_FILE_WRITE_ALL(FH,A,N,MPI_REAL,
      !&                        MPI_STATUS_IGNORE,IERR)

      CALL MPI_CHECK_CALL(IERR)
      CALL MPI_FILE_CLOSE(FH,IERR)
      CALL MPI_CHECK_CALL(IERR)

      CALL MPI_FINALIZE(IERR) 
      END PROGRAM INDEXED

      SUBROUTINE CREATE_TYPE(DATARES_TYPE,N)
        USE MPI
        IMPLICIT NONE
        INTEGER, INTENT(OUT) :: DATARES_TYPE
        INTEGER, INTENT(IN) :: N
        INTEGER :: IERR, MY_RANK
        INTEGER, ALLOCATABLE :: BLOCKLENS(:), DISPLACEMENTS(:)
        ALLOCATE(BLOCKLENS(N))
        ALLOCATE(DISPLACEMENTS(N))
        BLOCKLENS = 1
        CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)
        IF(MY_RANK==0)THEN
          DISPLACEMENTS(1)=0
          DISPLACEMENTS(2)=5
          DISPLACEMENTS(3)=2
          DISPLACEMENTS(4)=3
        ENDIF
        IF(MY_RANK==1)THEN
          DISPLACEMENTS(1)=4
          DISPLACEMENTS(2)=1
          DISPLACEMENTS(3)=6
          DISPLACEMENTS(4)=7
        ENDIF
        IF(MY_RANK==2)THEN
          DISPLACEMENTS(1)=8
          DISPLACEMENTS(2)=9
          DISPLACEMENTS(3)=14
          DISPLACEMENTS(4)=11
        ENDIF
        IF(MY_RANK==3)THEN
          DISPLACEMENTS(1)=12
          DISPLACEMENTS(2)=13
          DISPLACEMENTS(3)=10
          DISPLACEMENTS(4)=15
        ENDIF

        CALL MPI_TYPE_INDEXED(N, BLOCKLENS, DISPLACEMENTS,
     &                        MPI_REAL, DATARES_TYPE, IERR)
        CALL MPI_CHECK_CALL(IERR)
        CALL MPI_TYPE_COMMIT(DATARES_TYPE, IERR)
        CALL MPI_CHECK_CALL(IERR)
        DEALLOCATE(BLOCKLENS)
        DEALLOCATE(DISPLACEMENTS)
      END SUBROUTINE

      SUBROUTINE MPI_CHECK_CALL(IERR)
        USE MPI
        IMPLICIT NONE
        INTEGER, INTENT(IN) :: IERR
        INTEGER :: NERR, RESULTLEN
        CHARACTER(LEN=MPI_MAX_ERROR_STRING) :: SERR
        IF(IERR /= MPI_SUCCESS) THEN
          CALL MPI_ERROR_STRING(IERR,SERR,RESULTLEN,NERR)
          WRITE(*,*)SERR
          CALL BACKTRACE
        END IF
      END SUBROUTINE

Solution

You are not using MPI_File_set_view() correctly indeed. From the standard (MPI 3.1, chapter 13.3) (thanks to Wei-keng Liao for the pointer)

An etype (elementary datatype) is the unit of data access and positioning. It can be any MPI predefined or derived datatype. Derived etypes can be constructed by using any of the MPI datatype constructor routines, provided all resulting typemap displacements are non-negative and monotonically nondecreasing.

Your derived datatype does not meet the requirement on (at least) ranks 1 and 3

FWIW,

the program crashes if using ROM-IO from MPICH, and i reported the issue at https://github.com/pmodels/mpich/issues/2915, and ROM-IO should really return with an error message.
the program crashes/hangs with the latest Open MPI v3.0.x (which uses ompio by default), but works fine with the v3.1.x and master branches. Note the right fix is to fix your code, and future Open MPI versions will error in a near future.