Search code examples
node.jsatomicwritefilefsync

node fs.fsync (when to use?)


I want to safely write a file and I wan't to understand the proper use/place for fsync.

https://linux.die.net/man/2/fsync

After reading ^ that, I am puzzled as to where to effectively use it.

Question, do I:

fs.write('temp/file.txt','utf-8',function(error){
    if(error){fs.unlink('temp/file.txt',function(){cb(error,undefined);});}
    else{
        fs.rename('temp/file.txt','real/file.txt',function(){
            fs.fsync('real/file.txt',function(){
                cb(undefined,true);
                });
            });
        }
    });

I'm writing something that performs many file changes. I have looked at modules that write atomic, but I would like to understand the process.


Solution

  • fsync is one of those functions where it's extremely rare that you'll need to use it.

    All operating systems mask the fact that storage devices are slow by caching reads and writes. When you write to a file, it doesn't immediately write to the actual storage medium; it'll capture it in a cache, tell your program that the write has completed, and go and write the contents to the storage device in the background instead. The operating system will keep everything consistent though; if another application reads from that file, it'll see the new contents, as the OS will serve the contents from cache.

    Note for a moment that this isn't universal; I believe Windows disables caching for removable storage devices to prevent data loss when people pull the drive out. There's also some set of flags you can pass to open() to disable the cache.

    For almost all use cases, you don't need to care that this happens. The only upshot for you is that your program can continue faster. There are some cases where this is problematic though:

    • If power is lost, the contents of the cache are lost, so the disk won't have all the new contents of the file.
    • If the drive is removed, writes will equally be lost. This is pretty typical for removable storage devices, and I'm pretty sure 90% of people ignore the "safely remove" prompt ;).
    • I think doing direct reads directly from a device (i.e. /dev/sdX in Linux) will bypass this cache, but I'm not 100% sure.

    Examples of where it is needed are, say, databases. When you run an update query, the database will normally update its in-memory state, and write the mutation to a transaction log. Reliability is a good thing for a database though, so it will write to the transaction log and do an fsync on that file before responding to the user (or will have opened the transaction log as unbuffered) so there's some level of guarantee that the transaction has been persisted.

    In your example, the fsync will ensure that the rename has actually taken place and has been flushed to disk.