Search code examples
c#hashtagsmp3id3

Is effective usage of hashes of changing audio (mp3) files possible


I'm going to create a music library program, easy. Storing the information, easy.

I previously looked at another music library made in c#, the guy claimed that even if you move the file, on rediscovery it will know all the information about that file retrieved from the database (xml, sql).

More info on rediscovery: When you move files you have to get the music library to rediscover because its current information is wrong, such as the file path, on re discovery it will find the file, check it in the database, and update any information

I thought this is impossible, till now. If you hash a file and use that hash as the key, you can then use that to always check the file to make sure it is the one.

Please correct me if I'm wrong and confirm what I'm saying is true (that is the question).

  • File path isn't used in hashing the file. (I don't know how to hash)
  • Re hash after every ID3 tag write (changing the file changes the hash?)
  • Using the Hash as an Key/Id will mean that if the file is moved it can be still referenced to the information stored about it
  • Once information read is read out of the xml (if we're using xml as a database) file, storing it in a dictionary is the quickest and best way to have the contents in memory

It is a question, it needs an answer, its about c#. I'm using c#, thats why it's specific, I'm doing background research, I just wanted some expert opinion on what i've stated


Solution

  • Answering your questions

    • file path should not be used when computing hash. Neither filename nor extension.

    • rehashing after each ID3 tag write would solve your problem provided that all changes occur in your application

    • hash can safely be used as a key for your purposes (see below)

    • probably yes, if I understand you correctly

    Possibility of repeated hash value

    Depending on the hashing function you choose, if you search, you will find/generate another file with the same hash in year, millenium, billion years or you will not do it till the end of the world.

    It's all a matter of probabilities. Check details of each hashing function to learn how low the probability of finding another file with the same hash is.

    Problem of changed tags in mp3 files

    While this may be a problem, what you need to do is hash only the part of file that is not the ID3 tag. They are usually located at the end of the file and take a very small percent of the file size.

    What you can do is to use the hashing funciton on the part of the file that will not be changing. Just skip the last N bytes of a file when hashing.