Search code examples
asynchronoustimeoutatomicnfscorruption

Can I safely use `soft,rw` NFS mount if my application writes new files with tempory name and then renames the file?


It appears that some people have mantra about NFS that goes "use the soft option only when client responsiveness is more important than data integrity". Is this true in general or is this true only for specific usage patterns?

Considering that writing in the middle of an existing file is potentially risky even with local filesystems in case of e.g. power loss, I would expect well written user mode applications to write new files in the same directory with a temporary filename and then move the new file over existing file. I know that for local filesystem the move operation is atomic for any POSIX system.

How about NFS mount point with flags soft,rw? Is there potential for corruption if application always writes to unique temporary filename which is renamed over existing file to update the data? Is it important to check return value from close() before renaming the file?

Examples of sources that claim that NFS with flags soft,rw is unsafe:


Solution

  • Broadly, the answer to the title question is "Yes, if your use cases are simple, due to close-to-open consistency".


    Anatomical explanation:

    Server: The server owns the filesystem, and is responsible for the filesystem consistency guarantees. these guarantees include:

    • full metadata consistency
    • full file-level data consistency upon explicit synchronization
    • implicit file-level data consistency upon closing the file (close-to-open consistency)*

    The contract between the client and server is governed by the NFS spec.

    Client: The client owns the data and metadata presented to applications, and the client-side data and metadata cache where this information is taken from. It is responsible to populate, invalidate and flush this cache against the server as appropriate.
    For example, when you close a file or call fsync(), the client is responsible to flush its local cache and call COMMIT (NFSv4 is more complex).
    Importantly, if the client is performing a cache-invalidating operation such as WRITE, it enters a state where the attributes of the file, and sometimes the data itself, are not known, and it therefore cannot allow an application to rely on the cached version. When that happens, it just retries indefinitely. In this situation, it may sometimes be able to return EGAIN to the application, however, not all system calls allow EAGAIN (e.g. stat()). Therefore, its only resort is to block such system calls until the cache is valid again. This behavior is called "hard mount".
    The contract between the application and the client is only specified to the degree that it's documented in the man page. In particular, NFS does NOT fully implement the POSIX standard. Implicitly, most kernel-based clients not only assume the server implements close-to-open consistency, but also implement it for the application - namely, they flush the cache when the file is closed as described above.

    soft: a soft mount is a slightly different contract between the client and the application: instead of blocking clients until the data is available, after some timeout, EIO is returned (there's also softerr which caused ETIMEDOUT to be returned instead).


    Given the above, let's go over what happens with soft mounts when you write regularly vs when you rename:

    Let's say you exported a DB table to a file by using regular writes, and then the server became unavailable, and then you closed the file.
    Before the call to close(), some of the data was flushed from the client's cache to the server (using NFS WRITEs), and some didn't. This doesn't have to happen in any particular order: it could be that the last bytes of the file were flushed but the first ones didn't.
    The server, in turn, may have flushed some of those WRITEs to disk, but not all. It may also have crashed.
    Now, when the exporting application closes the file, let's say it gets EIO from close() - what does it do with it? Typically, it just prints it and exits. You can't even delete the file because the server is down.
    Then, when you try to import the table back, you might be lucky and get EIO because the server is still unavailable.
    But if the server is back up, the reader may see the file with the right size, but a bunch of missing data, with no way to know it's missing. That's your data corruption.

    Now let's say that you instead write to a temporary file and then rename.
    So you start by deleting the target file, and then write to a temporary file and before closing, the server crashes, and the server's disk now contains an inconsistent table, as above.
    Now, before renaming, you close the file (or call fsync()), and check the error. If you didn't get an error, you rename. Again you can't delete the source or target files because the server is down.
    So now when you try to import the table back, you get ENOENT and you know there's a problem.

    Note that by "simple use case" I mean:

    • You close the file or call fsync() before rename.
    • the reader of the files can handle the error (ENOENT in the above example).
    • You can delete the target file before you start, or your reader (import in the above example) can handle out-of-date files.
    • You only require file-level consistency, not cross-file consistency (think about what happens if the table data is in one file and the number of rows in the table is in another file).

    * Side note: close-to-open consistency is an NFS thing. POSIX doesn't guarantee that closing a file will flush it and provide data consistency, you have to explicitly call fsync() and check its return value. Most of the time this isn't a problem, but if consistency is critical you should call fsync() before close(). NFS guarantees close-to-open consistency which means that close() automatically calls fsync().