I'm trying to use mmap
to read a large file and calculate some data within it. However, I've noticed that during the computation, it seems like the entire file is still being loaded into memory (approximately 20GB), which confuses me. What should I do to make my operation consume only about 1GB of memory?
int main{
std::string srcPath = "srcList.bin";
int srcFd = open(srcPath.c_str(), O_RDONLY);
struct stat sb;
if (fstat(srcFd, &sb) == -1) {
perror("fstat");close(srcFd);
}
off_t srcLength = sb.st_size;
int64_t* srcAddr = static_cast<int64_t*>(mmap(nullptr, srcLength, PROT_READ, MAP_SHARED, srcFd, 0));
for (int i = 0 ; i <srcLength;i++ ){
int64_t tmp = srcAddr[i];
// do something
}
}
The entire file 'seems' to be loaded into memory is due to the way you access the memory-mapped file in your loop, mmap
maps the entire file into memory, but it doesn't necessarily load the entire file into the RAM immediately.
In the loop you're using, you access elements of the memory-mapped file one at a time using the srcAddr[i]
, when you do so, the OS will load a page of the file into memory, approx. ~4Kb. If you access all elements sequentially, the OS will load pages into memory as needed. However, if the loop you use involves random access throughout the file, it may end up loading many pages into memory, which can seem like loading the entire file.
To limit the amount of memory used to approx. 1Gb, you can control the size of the memory-mapped region by mapping only a portion of the file at a time and processing it in chunks. Something like:
off_t chunkSize = 1LL << 30; // 1Gb size
for (off_t offset = 0; offset < srcLength; offset += chunkSize) {
off_t mapSize = std::min(chunkSize, srcLength - offset);
int64_t* srcAddr = static_cast<int64_t*>(mmap(nullptr, mapSize, PROT_READ, MAP_SHARED, srcFd, offset)); // map the chunk
// Process the chunk
for (off_t i = 0; i < mapSize / sizeof(int64_t); i++) {
int64_t tmp = srcAddr[i];
// here you do something with tmp
}
munmap(srcAddr, mapSize); // unmap the chunk
}