I was writing a Lua script to read, browse, and edit a large text file. The approach I chose was to 'page through' the file, reading a segment into a buffer variable, then editing it or moving to the next page.
This works great for reading the file, but my current approach to writing the file doesn't save much time when compared to loading the whole file into memory and then dumping it back into the file system.
For writing I'm currently:
Since I have an int that points to the beginning of where buffer was taken from, and another that says what length buffer was originally, would there be any way to replace only the N bytes that were modified, instead of making a whole copy of the file and then moving it?
When treating a file as binary, you can certainly replace fixed length data at a particular file offset:
local file = assert(io.open('data', 'w+b'))
-- write three unsigned longs
file:write(string.pack('LLL', 11, 22, 33)):flush()
-- offset by the size of one unsigned long
file:seek('set', string.packsize 'L')
-- overwrite the second unsigned long
file:write(string.pack('L', 42)):flush()
-- rewind
file:seek('set')
local a, b, c = string.unpack('LLL', file:read(string.packsize 'LLL'))
file:close()
assert(os.remove 'data')
print(a, b, c) -- 11, 42, 33
This is quite common, but leaves your file open to data corruption if any given write operation fails.
Given
read, browse, and edit a large text file
though, the typical text file contains variable length (usually lines of) data.
Trying to directly replace a section of text with a smaller amount of data will lead to stale data remaining in the file (the tail of the previous data). Overwriting a section of text with a larger amount of data leads to data loss, when you overwrite the data that followed the original data.
If you are truly replacing a fixed amount of text (e.g., uppercasing a section of ASCII), then this is not too much different from dealing with binary data, except you may (will) run into problems on certain platforms that perform text mode translations (e.g., Windows, when [over]writing a newline (LF)).
Your approach of appending data to a temporary file, and then renaming the file is incredibly common, and is good practice as it leaves a paper trail that is easier to follow when things go wrong, and generally offers more control over the process.
If we presume file:read
and file:write
(ultimately fread
and fwrite
) to be O(n), then everything mentioned is O(n).