Say you have a file named foo
that contains some definite sequence of bytes X
, and you want to atomically replace it with a file named bar
that contains a byte sequence Y
. This is typically done with the rename()
system call---in this case, by invoking rename("bar", "foo")
. However, you want the two following constraints to be observed:
bar
does contain the data Y
, otherwise it should fail.foo
does contain the data X
otherwise it should fail.How to do that correctly?
To prevent foo
and bar
from being edited before we call rename()
, we can lock them with fnctl
or equivalent. But locks only help to prevent modifications of the file data, they have no effect on directory entries, so that, by the time rename()
does its magic, the data foo
or bar
refer to might not be the same.
Two examples of data loss scenarios, for the two constraints described above:
bar
and made sure that it contains the data Y
foo
with bar
, some program replaces bar
with a file previously named qux
that holds the data Z
.foo
with bar
.foo
, which we expected to contain the data of bar
, contains instead the data of qux
. Both the data of foo
and bar
is lost.foo
and made sure that it contains the data X
.foo
with bar
, some program replaces foo
with a file previously named qux
that holds the data Z
.foo
with bar
.foo
does contain the data of bar
, but the data of the file qux
has been lost in the process.Based on your comment:
It's for a deduplication tool. I want to replace foo with a link to another file that holds the same data as foo, without losing data in the process
I think you have an XY problem. You cannot make the rename
operation atomic with respect to contents of the files. But your goal is just to avoid data loss if a file changes unexpectedly during the deduplication process. That's amenable to other approaches, like keeping a hardlink to the old file and restoring it (either to the original name, or to a special recovery area) after performing the rename and then comparing to detect that it's changed.
However there are a lot of fundamental issues that still make this problematic, starting with at least:
A process may have an open handle for write on the old file, without having modified it yet, and may modify and close it after you deduplicate it. In that case the close operation will orphan it and the data will be lost.
Any process intending to modify one of the files being deduplicated will modify all duplicates at the same time once they're hard-linked, probably contrary to your expectation.
If your goal is deduplicating to save space, but keeping semantics to allow modification, you really need a filesystem that deduplicates fs blocks with copy-on-write semantics, not hard links. On the other hand, if you want hard links, you should treat the whole tree being deduplicated as essentially read-only during and after the deduplication operation.