Calling rename() without overwriting unexpected data and without overwriting with unexpected data

Say you have a file named foo that contains some definite sequence of bytes X, and you want to atomically replace it with a file named bar that contains a byte sequence Y. This is typically done with the rename() system call---in this case, by invoking rename("bar", "foo"). However, you want the two following constraints to be observed:

The replacement should only be performed if the file named bar does contain the data Y, otherwise it should fail.
The replacement should only be performed if the file named foo does contain the data X otherwise it should fail.

How to do that correctly?

To prevent foo and bar from being edited before we call rename(), we can lock them with fnctl or equivalent. But locks only help to prevent modifications of the file data, they have no effect on directory entries, so that, by the time rename() does its magic, the data foo or bar refer to might not be the same.

Two examples of data loss scenarios, for the two constraints described above:

- We've locked the file named bar and made sure that it contains the data Y
- Before we've time to replace foo with bar, some program replaces bar with a file previously named qux that holds the data Z.
- We replace foo with bar.
- Now the file foo, which we expected to contain the data of bar, contains instead the data of qux. Both the data of foo and bar is lost.
- We've locked the file named foo and made sure that it contains the data X.
- Before we've time to replace foo with bar, some program replaces foo with a file previously named qux that holds the data Z.
- We replace foo with bar.
- Now the file foo does contain the data of bar, but the data of the file qux has been lost in the process.

Solution

Based on your comment:

It's for a deduplication tool. I want to replace foo with a link to another file that holds the same data as foo, without losing data in the process

I think you have an XY problem. You cannot make the rename operation atomic with respect to contents of the files. But your goal is just to avoid data loss if a file changes unexpectedly during the deduplication process. That's amenable to other approaches, like keeping a hardlink to the old file and restoring it (either to the original name, or to a special recovery area) after performing the rename and then comparing to detect that it's changed.

However there are a lot of fundamental issues that still make this problematic, starting with at least:

A process may have an open handle for write on the old file, without having modified it yet, and may modify and close it after you deduplicate it. In that case the close operation will orphan it and the data will be lost.
Any process intending to modify one of the files being deduplicated will modify all duplicates at the same time once they're hard-linked, probably contrary to your expectation.

If your goal is deduplicating to save space, but keeping semantics to allow modification, you really need a filesystem that deduplicates fs blocks with copy-on-write semantics, not hard links. On the other hand, if you want hard links, you should treat the whole tree being deduplicated as essentially read-only during and after the deduplication operation.