I am trying to read a binary file containing a sequence of char and double. (For example 0 0.125 1 1.4 0 2.3 1 4.5, but written in a binary file). I created a simple struct input, and also an MPI Datatype I will call mpi_input corresponding to this struct.
typedef struct { char type; double value } input;
I would like to read my file in parallel (ie here using different processors) using MPI_File_read_at_all
. I would like to use the datatype mpi_input in this function.
The problem is, I think that this function needs a buffer it will write into until the end. I tried using an input *buffer
, but this creates issues due to data structure alignement. Have you any ideas on how to do this ?
Here is a minimal working example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <assert.h>
#include <stddef.h>
int main(int argc, char** argv)
{
typedef struct
{
double val;
char type;
} input;
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
assert(size == 4);
MPI_File in;
MPI_Offset filesize;
MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &in);
MPI_File_get_size(in, &filesize);
int mysize = filesize/size;
int globalstart = rank * mysize;
input *chunk = malloc(sizeof(input)*2);
int n = 128;
int m = 256;
int blocks[2] = {1,1};
MPI_Datatype types[2] = {MPI_BYTE, MPI_DOUBLE};
MPI_Aint displacements[2];
MPI_Datatype cell_type;
MPI_Aint charex, doublex;
displacements[0] = offsetof(input, type);
displacements[1] = offsetof(input, val);
MPI_Type_create_struct(2, blocks, displacements, types, &cell_type);
MPI_Type_commit(&cell_type);
MPI_File_read_at_all(in, globalstart, chunk, mysize, cell_type, MPI_STATUS_IGNORE);
if(rank == 0)
printf("0 - Got %d %f\n", chunk->val, chunk->type);
if(rank == 4)
printf("Got %d %f\n", chunk->val, chunk->type);
MPI_File_close(&in);
MPI_Finalize();
}
And here is a code to generate a simple binary file:
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp;
char* filename = "test.dump";
fp = fopen(filename, "wb");
char bla = 8;
for(double i = 0; i < 8; i++)
{
fwrite(&bla, sizeof(char), 1, fp);
bla--;
fwrite(&i, sizeof(double), 1, fp);
}
fclose(fp);
}
You are providing MPI_File_read_at_all
the wrong arguments. In MPI, arguments that relate to the data to be send/received/read/written are almost always given as a triplet of the following form: buffer, #elements, datatype
.
In your case, #elements
equals mysize
, which is in bytes and not in number of elements of datatype cell_type
. As a result, the function reads more elements than can fit inside the buffer and thus corrupts the heap.
What you should do instead, is to divide mysize
by the size of the datatype (and that's not sizeof(input)
!):
int cell_type_size;
MPI_Type_size(cell_type, &cell_type_size);
...
MPI_File_read_at_all(in, globalstart,
chunk, mysize / cell_type_size, cell_type, MPI_STATUS_IGNORE);
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// data specification triplet
Also, your second printf
statement will never execute since rank
varies from 0
to 3
in the case of 4 MPI processes.