Search code examples
javascriptorigin-private-file-system

How to efficiently append to a file?


I'm appending data to a log file using OPFS (Origin Private File System browser API) and I noticed it's very slow.

It takes ~22ms to write 5MB to a new file but it also takes ~23ms to append a single byte.

I didn't file anything like an append flag. I had to read the file size and start writing at that position to append data. Is there a better way of appending data?

For comparison, in Node.js it takes 1.3ms to write 5MB to a new file and 0.1ms to append a single byte.

Ignoring native FS vs OPFS write performance numbers, I expect append performance to certainly be better than overriding the entire file with the existing data plus the new data.

OPFS benchmark code:

(async () => {
  async function create5MBFile(fileHandle) {
    console.time('write 5MB');
    const writable = await fileHandle.createWritable();

    const chunk = new Uint8Array(5 * 1024 * 1024).fill(0);
    await writable.write(chunk);

    await writable.close();
    console.timeEnd('write 5MB');
  }

  async function appendByte(fileHandle) {
    const byteToAppend = new Uint8Array([0]);
    console.time('append 1 byte');
    const file = await fileHandle.getFile();
    const currentSize = file.size;

    const writable = await fileHandle.createWritable({ keepExistingData: true });
    await writable.write({ type: 'write', position: currentSize, data: byteToAppend });
    await writable.close();
    console.timeEnd('append 1 byte');
  }

  const root = await navigator.storage.getDirectory();
  const fileHandle = await root.getFileHandle('append.log', { create: true });
  await create5MBFile(fileHandle);
  await appendByte(fileHandle);
})();

Node.js benchmark:

const fs = require('fs');
const path = require('path');
const filePath = path.join(__dirname, 'append.log');

function create5MBFileSync() {
  const fileBuffer = Buffer.alloc(5 * 1024 * 1024, 0);
  console.time('write 5MB');
  fs.writeFileSync(filePath, fileBuffer);
  console.timeEnd('write 5MB');
}

function appendByteSync() {
  const byteToAppend = Buffer.from([0]);
  console.time('append 1 byte');
  fs.appendFileSync(filePath, byteToAppend);
  console.timeEnd('append 1 byte');
}

create5MBFileSync();
appendByteSync();

Solution

  • I didn't file anything like an append flag. I had to read the file size and start writing at that position to append data. Is there a better way of appending data?

    The current design of the createWritable method veers on the side of caution:

    Returns a FileSystemWritableFileStream that can be used to write to the file. Any changes made through stream won’t be reflected in the file entry locatable by fileHandle’s locator until the stream has been closed. User agents try to ensure that no partial writes happen, i.e. the file will either contain its old contents or it will contain whatever data was written through stream up until the stream has been closed.

    ...by creating an eager (and expensive) copy of the File's contents (inside createWritable( { keepExistingData: true } )) first, before any other writes, it means that if the FileSystemWritableFileStream is repositioned to overwrite existing data and if the greater-operation fails at some point (so FileSystemWritableFileStream.close() is never called) then the File can always be reverted back to its original contents.

    Whereas in a conventional OS and filesystem, if you open an existing file for writing/append, and then reposition the write-head in older data, then overwrite enough existing data then the OS's buffers will be flushed to disk eventually, leaving the file in an indeterminate and unrecoverable state if something bad happens before you can flush and close properly.

    ...though at least conventional OS/filesystems do offer some kind of "real" append mode and/or a way to explicitly choose between Open-Existing vs. Truncate (O_APPEND and O_TRUNC on POSIX, TRUNCATE_EXISTING on Windows, though on Windows you need to move the stream's head yourself).


    It takes ~22ms to write 5MB to a new file but it also takes ~23ms to append a single byte.

    I feel your benchmark has a slight flaw: using console.time means making a blocking synchronous call (just like console.log). So I took your benchmark and replaced console.time with performance.mark and I get much lower overall nubmers - I also measure each step separately - and the results show that writing a single byte really is fast - the slowdown in appendByte is all because of the safety-copying in await fileHandle.createWritable( { keepExistingData: true } ):

    Posted to JSFiddle (as OPFS has issues running in StackOverflow's Snippets): https://jsfiddle.net/v6phgfba/

    async function run() {
    
      const root       = await navigator.storage.getDirectory();
      const fileHandle = await root.getFileHandle('append.log', { create: true });
      
      await create5MBFile(fileHandle); // ~13ms
      
      await appendByte(fileHandle, { keepExistingData: true, mode: "exclusive" } ); // ~19ms
      
      await appendByte(fileHandle, { mode: "exclusive" } ); // ~4ms (but doesn't actually append: the data is lost)
    };
    
    
    async function create5MBFile(fileHandle/*: FileSystemFileHandle*/) {
      window.performance.mark('start');
    
      const chunk = new Uint8Array(5 * 1024 * 1024).fill(0);
      window.performance.mark('allocated 5MB');
    
      const writable = await fileHandle.createWritable();
      window.performance.mark('got writeable');
      try {
        await writable.write(chunk);
        window.performance.mark('wrote 5MB');
      }
      finally {
        await writable.close();
        window.performance.mark('closed writeable');
    
        const measurements = [
          window.performance.measure( 'M1: allocation'   , /*from:*/ 'start'        , /*to:*/ 'allocated 5MB'    ), //  1.5ms
          window.performance.measure( 'M2: get writeable', /*from:*/ 'allocated 5MB', /*to:*/ 'got writeable'    ), //  0.9ms
          window.performance.measure( 'M3: write time'   , /*from:*/ 'got writeable', /*to:*/ 'wrote 5MB'        ), // 10.3ms
          window.performance.measure( 'M4: close time'   , /*from:*/ 'wrote 5MB'    , /*to:*/ 'closed writeable' )  //  1.5ms
        ];
        console.log( measurements );
        console.log( "create5MBFile: total time: %f ms", measurements.reduce( ( a, m ) => a + m.duration, 0 ) );
      }
    };
    
    async function appendByte(fileHandle/*: FileSystemFileHandle*/, writeableOpts) {
      window.performance.mark('start');
      
      const byteToAppend = new Uint8Array([0]);
      window.performance.mark('allocated 1B');
    
      const file        = await fileHandle.getFile();
      const currentSize = file.size;
      window.performance.mark('got File/Blob length');
    
      const writable = await fileHandle.createWritable(writeableOpts);
      window.performance.mark('got writable');
      try {
        await writable.write({ type: 'write', position: currentSize, data: byteToAppend });
        window.performance.mark('wrote 1B after-end');
      }
      finally {
        await writable.close();
        window.performance.mark('closed writeable');
        
        const measurements = [
          window.performance.measure( 'M1: allocation'   , /*from:*/ 'start'               , /*to:*/ 'allocated 1B'          ), // 0ms
          window.performance.measure( 'M2: get length'   , /*from:*/ 'allocated 1B'        , /*to:*/ 'got File/Blob length'  ), // 0.2ms
          window.performance.measure( 'M3: get writeable', /*from:*/ 'got File/Blob length', /*to:*/ 'got writable'          ), // 17.4ms
          window.performance.measure( 'M4: write time'   , /*from:*/ 'got writable'        , /*to:*/ 'wrote 1B after-end'    ), // 0.9ms
          window.performance.measure( 'M5: close time'   , /*from:*/ 'wrote 1B after-end'  , /*to:*/ 'closed writeable'      )  // 1.7ms
        ];
        console.log( measurements );
        console.log( "appendByte: total time: %f ms", measurements.reduce( ( a, m ) => a + m.duration, 0 ) );
      }
    };
    
    run();
    

    My results (Chrome 127 x64 on Windows 10 22H2 x64) after running the above script 3 times in succession are:

    [
        {
            duration : 1.5999999940395355,
            entryType: "measure",
            name     : "M1: allocation",
            startTime: 141.10000002384186
        },
        {
            duration : 0.9000000059604645,
            entryType: "measure",
            name     : "M2: get writeable",
            startTime: 142.7000000178814
        },
        {
            duration : 10.099999994039536,
            entryType: "measure",
            name     : "M3: write time",
            startTime: 143.60000002384186
        },
        {
            duration : 1,
            entryType: "measure",
            name     : "M4: close time",
            startTime: 153.7000000178814
        }
    ]
    /* "create5MBFile: total time: 13.599ms */,
    
    [
        {
            duration : 0,
            entryType: "measure",
            name     : "M1: allocation",
            startTime: 155.2000000178814
        },
        {
            duration : 0.29999998211860657,
            entryType: "measure",
            name     : "M2: get length",
            startTime: 155.2000000178814
        },
        {
            duration : 19.30000001192093,
            entryType: "measure",
            name     : "M3: get writeable",
            startTime: 155.5
        },
        {
            duration : 5.0999999940395355,
            entryType: "measure",
            name     : "M4: write time",
            startTime: 174.80000001192093
        },
        {
            duration : 1.300000011920929,
            entryType: "measure",
            name     : "M5: close time",
            startTime: 179.90000000596046
        }
    ]
    /*"appendByte: total time: 26ms */
    
    [
        {
            duration : 0,
            entryType: "measure",
            name     : "M1: allocation",
            startTime: 181.40000000596046
        },
        {
            duration : 0.4000000059604645,
            entryType: "measure",
            name     : "M2: get length",
            startTime: 181.40000000596046
        },
        {
            duration : 0.9000000059604645,
            entryType: "measure",
            name     : "M3: get writeable",
            startTime: 181.80000001192093
        },
        {
            duration : 0.699999988079071,
            entryType: "measure",
            name     : "M4: write time",
            startTime: 182.7000000178814
        },
        {
            duration : 1.2000000178813934,
            entryType: "measure",
            name     : "M5: close time",
            startTime: 183.40000000596046
        }
    ]
    /*"appendByte: total time: 3.200ms",  */
    
    

    Is there a better way of appending data?


    That said - ~10ms is literally a fraction of a blink-of-an-eye - if the "cost" of safe(r) file writes is an imperceivable lag in writes then I'm all for it.