I'm using File::Map often to map especially small text files into memory and e.g. process some read-only regular expressions on those. Now I have a use case in which I need to replace some text in the file as well and thought that I can still use File::Map
, because it documents the following:
Files are mapped into a variable that can be read just like any other variable, and it can be written to using standard Perl techniques such as regexps and substr.
While the data I'm interested in to replace is properly replaced within the file, I'm losing data because the file keeps its original size and data is truncated in the end. The new data is a little bit larger than the old one. Both things are warned about as documented using the following sentences:
Writing directly to a memory mapped file is not recommended
Truncating new value to size of the memory map
The explanations to both warnings read like one shouldn't ever write anything using File::Map
, but it might work in cases one can either live with truncated files or the overall file size is simply not changed at all. But the first quote explicitly mentions writes as supported without any exception from that rule.
So, is there some special way to safely write using File::Map
, e.g. getting the underlying file increased and such? The first warning uses the wording directly
, which I have the feeling that there's some other, better supported way to write?
I'm simply using =~ s///
on the mapped view currently, which seems to be the wrong approach. I couldn't even find anyone trying to write using File::Map
at all, only the official tests which do exactly what I do and expect the warnings I get. Additionally, looking at the code, there seems to be only one use case in which writing doesn't result in a warning at all, though I don't understand how I'm able to trigger that:
static int mmap_write(pTHX_ SV* var, MAGIC* magic) {
struct mmap_info* info = (struct mmap_info*) magic->mg_ptr;
if (!SvOK(var))
mmap_fixup(aTHX_ var, info, NULL, 0);
else if (!SvPOK(var)) {
STRLEN len;
const char* string = SvPV(var, len);
mmap_fixup(aTHX_ var, info, string, len);
}
else if (SvPVX(var) != info->fake_address)
mmap_fixup(aTHX_ var, info, SvPVX(var), SvCUR(var));
else
SvPOK_only_UTF8(var);
return 0;
}
https://metacpan.org/source/LEONT/File-Map-0.55/lib/File/Map.xs#L240
After all, if writing should be avoided at all, why do the docs explicitly mention it as supported? Doesn't look supported to me if it results at least in a warning in all cases but one.
An mmap is a fixed-sized mapping of a portion of a file to memory.
The various mapping functions set the string buffer of the provided scalar to the mapped memory page. The OS will reflect any changes to that buffer to the file and vice versa if requested.
The proper way to work with an mmap is to modify the string buffer, not replace it.
Anything that changes the string buffer without changing its size is appropriate.
$ perl -e'print "\0"x16' >scratch
$ perl -MFile::Map=map_file -we'
map_file my $map, "scratch", "+<";
$map =~ s/\x00/\xFF/g; # ok
substr($map, 6, 2, "00"); # ok
substr($map, 8, 2) = "11"; # ok
substr($map, 7, 2) =~ s/../22/; # ok
'
$ hexdump -C scratch
00000000 ff ff ff ff ff ff 30 32 32 31 ff ff ff ff ff ff |......0221......|
00000010
Anything that replaces the string buffer (such as assigning to the scalar) is not ok.
...kinda. The module notices you've replaced the scalar's buffer. It proceeds to copy the contents of the new buffer to the mapped memory, then replaces the scalar's buffer with the pointer to the mapped memory.
$ perl -e'print "\0"x16' >scratch
$ perl -MFile::Map=map_file -we'
map_file my $map, "scratch", "+<";
$map = "4" x 16; # Effectively: substr($map, 0, 16, "4" x 16)
'
Writing directly to a memory mapped file is not recommended at -e line 3.
$ hexdump -C scratch
00000000 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 |4444444444444444|
00000010
Aside from the warning can be silenced using no warnings qw( substr );
,[1] the only down side is that doing this way requires using memcpy
to copy length($map)
bytes, while using substr($map, $pos, length($repl), $repl)
only requires copying length($repl)
bytes.
Anything that changes the size of string buffer is not ok.
$ perl -MFile::Map=map_file -we'
map_file my $map, "scratch", "+<";
$map = "5" x 32; # Effectively: substr($map, 0, 16, "5" x 16)
'
Writing directly to a memory mapped file is not recommended at -e line 3.
Truncating new value to size of the memory map at -e line 3.
$ hexdump -C scratch
00000000 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 |5555555555555555|
00000010
WARNING: The module doesn't warn if you shrink the buffer, even though this has no effect except to clobber one of the bytes with a NUL.
$ perl -e'print "\0"x16' >scratch
$ perl -MFile::Map=map_file -we'
map_file my $map, "scratch", "+<";
substr($map, 0, 16, "6" x 16);
substr($map, 14, 2, "");
'
$ hexdump -C scratch
00000000 36 36 36 36 36 36 36 36 36 36 36 36 36 36 00 36 |66666666666666.6|
00000010
I've submitted a ticket.
substr
, but I suppose it also warn when using substr
"incorrectly".