Search code examples
fileoptimizationluatime-complexity

Will editing a file always have O(N) time complexity?


I was writing a Lua script to read, browse, and edit a large text file. The approach I chose was to 'page through' the file, reading a segment into a buffer variable, then editing it or moving to the next page.

This works great for reading the file, but my current approach to writing the file doesn't save much time when compared to loading the whole file into memory and then dumping it back into the file system.

For writing I'm currently:

  1. opening "tempFile" in append mode
  2. appending bytes [0, N) to the temp file
  3. appending the buffer
  4. appending [N + pageSize, fileSize]
  5. doing os.rename ("tempFile", "fileName")

Since I have an int that points to the beginning of where buffer was taken from, and another that says what length buffer was originally, would there be any way to replace only the N bytes that were modified, instead of making a whole copy of the file and then moving it?


Solution

  • When treating a file as binary, you can certainly replace fixed length data at a particular file offset:

    local file = assert(io.open('data', 'w+b'))
    
    -- write three unsigned longs
    file:write(string.pack('LLL', 11, 22, 33)):flush()
    -- offset by the size of one unsigned long
    file:seek('set', string.packsize 'L')
    -- overwrite the second unsigned long
    file:write(string.pack('L', 42)):flush()
    -- rewind
    file:seek('set')
    
    local a, b, c = string.unpack('LLL', file:read(string.packsize 'LLL'))
    
    file:close()
    assert(os.remove 'data')
                   
    print(a, b, c) -- 11, 42, 33
    

    This is quite common, but leaves your file open to data corruption if any given write operation fails.

    Given

    read, browse, and edit a large text file

    though, the typical text file contains variable length (usually lines of) data.

    Trying to directly replace a section of text with a smaller amount of data will lead to stale data remaining in the file (the tail of the previous data). Overwriting a section of text with a larger amount of data leads to data loss, when you overwrite the data that followed the original data.

    If you are truly replacing a fixed amount of text (e.g., uppercasing a section of ASCII), then this is not too much different from dealing with binary data, except you may (will) run into problems on certain platforms that perform text mode translations (e.g., Windows, when [over]writing a newline (LF)).

    Your approach of appending data to a temporary file, and then renaming the file is incredibly common, and is good practice as it leaves a paper trail that is easier to follow when things go wrong, and generally offers more control over the process.

    If we presume file:read and file:write (ultimately fread and fwrite) to be O(n), then everything mentioned is O(n).