The question is regarding memory mapped files - doest it help me achieve better write performance if I just have to keep appending data to memory mapped files when compared to writing a file to disk directly.
My analytics application generates large amount of data which will then be aggregated at the end when processing of all input lines are completed.
When I process the input lines sequentially, I have no issues because I can do the aggregation and release the output data before I pick up the next input line.
The issue is when I process the input lines in parallel, I have to keep the output data until I complete all the input lines. The output data for about 100K input lines could be as big as 10GB. I use serverGC so the GC doesn't impact adversely. Now keeping this in memory is proving to be a challenge. Because the input lines can go higher upto 500K.
So the next option was to temporarily write it to disk from each thread and at the end read all that and do the aggregation. But as guessed, its proving to be very very slow due to the disk writes.
Would using memory mapped file help in this situation. Or do you suggest any other option, like may be a database that do lazy writes to disk so the app doesn't have the performance hit.
I suppose that data is sparse. Why don't you try to compress it and store in RAM before doing aggregation?