I would like to use a memory mapped file to write data. I am using the following test code on a ubuntu machine. The code is compiled with g++ -std=c++14 -O3
.
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <cstdlib>
#include <cstdio>
#include <cassert>
int main(){
constexpr size_t GB1 = 1 << 30;
size_t capacity = GB1 * 4;
size_t numElements = capacity / sizeof(size_t);
int fd = open("./mmapfile", O_RDWR);
assert(fd >= 0);
int error = ftruncate(fd, capacity);
assert(error == 0);
void* ptr = mmap(0, capacity, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
assert(ptr != MAP_FAILED);
size_t* data = (size_t*)ptr;
for(size_t i = 0; i < numElements; i++){
data[i] = i;
}
munmap(ptr, capacity);
}
The data is correctly being written to file. However, the htop
command shows that half of the disk io bandwidth of the program is used by read accesses. My concern is that the code will not perform well if only half the bandwith can be used for writes.
Why are there read accesses in the code? Can they be avoided or are they expected?
The read access occurs because as the pages are accessed for the first time they need to be read in from disk. The OS is not clarvoyant and doesn't know that the reads will be thrown out.
To avoid the issue, don't use mmap()
. Build the blocks in buffer and write them out the old fashioned way.