I'm trying to parse some text files of size up to a few hundred megabytes in a context where performance is important, so I'm using boost mapped_file_source. The parser expects the source to be terminated with a null byte, so I want to check whether the file size is an exact multiple of the page size (and if so, fall back on a slower, non-memory mapped method). I thought I could do this with:
if (mf.size() & (mf.alignment() - 1))
But it turns out on one test file with size 20480, the alignment is 65536 (on Windows 7, 64 bit) and the program is crashing. I think what's going on is that the page size is actually smaller than the alignment, so my test isn't working.
How can I get the page size? Or is there something else I should be doing instead? (I need solutions for both Windows and Linux, willing to write system specific code if necessary but would prefer portable code if possible.)
The simplest thing to do seems fixing the parser to take the end of the input into account (not too outrageous, really).
Next up: a big warning. Relying on trailing bytes in the map (if any) to be zero is undefined¹: http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html
So, just map the file using size+1, and deterministically add the NUL terminator. I don't think this is worth getting into platform specific/undefined behaviour for.
In fact I just learned of boost::iostreams::mapped_file_base::mapmode::priv
, which is perfect for your needs:
A file opened with private access can be written to, but the changes will not affect the underlying file [docs]
Here's a simple snippet: Live On Coliru
#include <boost/iostreams/device/mapped_file.hpp>
#include <fstream>
#include <iostream>
namespace io = boost::iostreams;
int main() {
// of course, prefer `stat(1)` or `boost::filesystem::file_size()`, but for exposition:
std::streamsize const length = std::distance(std::istreambuf_iterator<char>(std::ifstream("main.cpp").rdbuf()), {});
io::mapped_file mf("main.cpp", io::mapped_file_base::mapmode::priv, length+1);
*(mf.end()-1) = '\0'; // voilà, null termination done, safely, quickly and reliably
std::cout << length << "\n";
std::cout << mf.size() << "\n";
}
Alternative spellings:
mf.data()[length] = '\0'; // voilà, null termination done, safely, quickly and reliably
*(mf.begin()+length) = 0; // etc.
¹ AFAICT it might kill a bunny or crash your process.