Lets say I have N files in a format like this:
One File looks like this:
For each time there is some amount of data with different id
- time 1:
- data with id: 10
- data with id: 13
- data with id: 4
- time 2:
- data with id: 10
- data with id: 77
...etc
(for each time the data with ids from 1-1000 are spreaded some how (mixed) over these N files)
I would like to combine all these N files so that I have a single file which is ordered :
Final File:
- time 1:
- data with id: 1
- data with id: 2
- data with id: 3
- ...
- data with id: 1000
- time 2:
- data with id: 1
- data with id: 2
- data with id: 3
- ...
- data with id: 1000
...etc
The size of data id 1-1000 is approximately 100mb, but I have a lot of times which accounts for up to 50 Gbytes of data.
My solution for this problem would be so far like this to make this as fast as possible:
I use T threads on a supercomputer node (1 computer with e.g. 24-48 cores) (for example). I have allocated a shared memory array to hold all datas with ids 1 - 1000 for one time (can also be more if I like)
Procedure:
Step 1:
Step 2:
asdasd
Thanks for any inputs!
Your approach will probably work OK for a moderate amount of data, but you've made one rank the central point of communication here. That's not going to scale terribly well.
You're on the right track with your part 2: a parallel write using MPI-IO sounds like a good approach to me. Here's how that might go:
data with id: 4
that some other processes have id 1, 2, 3, and 5 ? If so, then every process knows where it's data has to go. If you don't know the max ID and the max timesteps, you'd have to do a bit of communication (MPI_Allreduce with MPI_MAX as the operation) to find that.
With these preliminaries, you can set an MPI-IO "file view", probably using MPI_Type_indexed
On rank 0, this gets a bit more complicated because you need to add to your list of data the timestep markers. Or, you can define a file format with an index of timesteps, and store that index in a header or footer.
The code would look roughly like this:
for(i=0; i<nitems; i++)
datalen[i] = sizeof(item);
offsets[i] = sizeof(item)*index_of_item;
}
MPI_Type_create_indexed(nitems, datalen, offsets, MPI_BYTE, &filetype);
MPI_File_set_view(fh, 0, MPI_BYTE, filetype, "native", MPI_INFO_NULL);
MPI_File_write_all(fh, buffer, nitems*sizeof(item), MPI_BYTE, &status);
The _all bit here is important: you're going to create a highly non-contiguous, irregular access pattern from each MPI processor. Give the MPI-IO library a chance to optimize that request.
Also it's important to note that MPI-IO file views must be monotonically non-decreasing, so you'll have to sort the items locally before writing the data out collectively. Local memory operations have an insignificant cost relative to an I/O operation, so this usually isn't a big deal.