Search code examples
windowsfileidentityuniqueidentifieridentify

How can you identify a file without a filename or filepath?


If I were to give you a file. You can read the file but you can't change it or copy it. Then I take the file, rename it, move it to a new location. How could you identify that file? (Fairly reliably)

I'm looking if I have a database of media files for a program and the user alters the location/name of file, could I find the file by searching a directory and looking for something.


Solution

  • I have done exactly this, it's not hard.

    I take a 256-bit hash (I forget which routine I used off the top of my head) of the file and the filesize and write it to a table. If they match the files match. (And I think tracking the size is more paranoia than necessity.) To speed things up I also fold that hash to a 32-bit value. If the 32-bit values match then I check all the data.

    For the sake of performance I persist the last 10 million files I have examined. The 32-bit values go in one file which is read in it's entirety, when a main record needs to be examined I pull in a "page" (I forget exactly how big) of them which is padded to align it with the disk.