Search code examples
c++iothread-safetyfstreamdiskcache

Is it safe for two threads to write identical content to the same file?


Suppose a program has a caching mechanism where, at the end of some specific calculation, the program writes the output of that calculation to the disk to avoid re-computing it later, when the program is re-ran. It does so for a large number of calculations, and saves each output to separate files (one per calculation, with filenames determined by hashing the computation parameters). The data is written to the file with standard C++ streams:

    void* data = /* result of computation */;
    std::size_t dataSize = /* size of the result in bytes */;
    std::string cacheFile = /* unique filename for this computation */;

    std::ofstream out(cacheFile, std::ios::binary);
    out << dataSize;
    out.write(static_cast<const char *>(data), dataSize);

The calculation is deterministic, hence the data written to a given file will always be the same.

Question: is it safe for multiple threads (or processes) to attempt this simultaneously, for the same calculation, and with the same output file? It does not matter if some threads or processes fail to write the file, as long as at least one succeeds, and as long as all programs are left in a valid state.

In the manual tests I ran, no program failure or data corruption occurred, and the file was always created with the correct content, but this may be platform-dependent. For reference, in our specific case, the size of the data ranges from 2 to 50 kilobytes.


Solution

  • is it safe for multiple threads (or processes) to attempt this simultaneously, for the same calculation, and with the same output file?

    It is a race condition when multiple threads try to write into the same file, so that you may end up with a corrupted file. There is no guarantee that ofstream::write is atomic and that depends on a particular filesystem.

    The robust solution for your problem (works both with multiple threads and/or processes):

    1. Write into a temporary file with a unique name in the destination directory (so that the temporary and the final files are in the same filesystem for rename to not move data).
    2. rename the temporary file to its final name. It replaces the existing file if one is there. Non-portable renameat2 is more flexible.