Search code examples
c#unit-testinghardlink

How to determine if two file are hard-linked to the same data?


I've written an extension method for the System.IO.FileInfo class to create hard-link and it goes like this:

[DllImport("Kernel32.dll", CharSet = CharSet.Unicode)]
private static extern bool CreateHardLink(string lpFileName, string lpExistingFileName, IntPtr lpSecurityAttributes);

public static void CreateHardLink(this FileInfo file, string destination) {
    CreateHardLink(destination, file.FullName, IntPtr.Zero);
}

// Usage:
fileInfo.CreateHardLink(@".\hardLinkCopy.txt");

The method works fine, but I'd like to make some unit tests just for the sake of it. So how can I assert that a file x and another file y are linked to the same data?

I came up with some ways to test it:

  • Check if data is consistent throughout changes Since creating a hard-link copy is just giving a second name to a file, any modification done to the first instance will be reflected on the second one, and vice-versa. If the data stay consistent between two files despite modification, it's safe to assume that these files are both hard-linked to the same data.
  • Assert that the creation of an hard-link doesn't affect the parent folder's size. Since a hard-link copy doesn't copy any data on disk, the parent directory shouldn't get any heavier. If upon calling the method a new file is created with the same content as the original file, and the parent folder didn't change in size (or gain less than what a normal copy would do), the new file must be a hard-link copy.

However, these methods smell. There's got to be at least one build-in method somewhere in the OS to check if two files point to the same data on disk!

Anyone could share a lead?


Solution

  • After following Bennett Yeo's suggestion, I found the following :

    There's no direct way to check if two files are linked to the same data but we can make our own methods by comparing the file's unique id (or inode in UNIX-based system). In my understanding, this value serves as an index to the actual content on disk.

    Bennett also linked This thread, which gave me two way to get a file's unique ID:

    1. The linked answer proposed calling GetFileInformationByHandle from kernel32.dll. As the method's name implies, I must first get a handle for the file but whenever I try to get one, an exception is thrown saying that the targeted file is used by another process.
    2. And, lastly, by using the command fsutil file queryfileid <filename> (Credit to this answer).

    The second method work for me, so I wrote the following code:

    private static string InvokeShellAndGetOutput(string fileName, string arguments) {
        Process p = new Process();
        p.StartInfo.UseShellExecute = false;
        p.StartInfo.RedirectStandardOutput = true;
        p.StartInfo.FileName = fileName;
        p.StartInfo.Arguments = arguments;
        p.Start();
        string output = p.StandardOutput.ReadToEnd();
        p.WaitForExit();
        return output;
    }
    
    public static long GetFileId(this FileInfo fileInfo) {
        // Call "fsutil" to get the unique file id
        string output = InvokeShellAndGetOutput("fsutil", $"file queryfileid {fileInfo.FullName}");
    
        // Remove the following characters: "File ID is " and the EOL at the end. The remaining string is an hex string with the "0x" prefix.
        string parsedOutput = output.Remove(0, 11).Trim();
        return Convert.ToInt64(parsedOutput, 16); ;
    }
    
    public static bool IsHardlinkedToSameData(this FileInfo fileInfo, FileInfo otherFileInfo) {
        return fileInfo.GetFileId() == otherFileInfo.GetFileId();
    }
    

    It's patchy but I feel it's already more reliable than my previous ideas. As long as the host running the test has "fsutil" installed, it should work.

    Any more reliable solutions are still welcomed.