Search code examples
compressionshalossless-compression

possible to recreate a file using metadata, SHA, and a few bytes?


I may be a fool, but I hope that it would be possible to recreate an exact copy of a file if we only have:

  • it's exact lenght and type
  • a few of it's begining bytes( suppose 100)
  • it's md5 and sha256 signatures.

I do not know if this possible, but so far I haven't been able to find online anything ressembling what I'm describing.

Sidenote: I know such process would be computationally expensive, so if possible I intend to try it with small files(1-3mb)


Solution

  • Unfortunately, this is not practical, and given your example of "small files (1-3mb)", you're way off in scale of what current computers will be able to do in terms of performance.

    A simple Wolfram Alpha query to calculate how long

    • 128 bytes of data file (256^128 possible combinations of byte sequences)
    • when you can calculate a billion billion billion (10^9 * 10^9 * 10^9 = 10^27) digests every second
    • ... in minutes, then hours, then days, then years

    would take, and you end up with....

    10463951242053391806136963369726580181263718864311851635192874886429209483641954321222640418122029864527291727710479949464718215680589004332016189037791576956967351342601788071700268169006221818240189631008834448226154239518944108944497601509840881752510934060240763835605888507473266002770708660224/1835636794567108154296875 years

    or shortened to ...

    5.7004475357125694689539104223396268823502567825415606695024 × 10^273

    aka "something something" with 273 zeroes at the end ... years.

    Even taking in luck and average number of attempts before hitting the right number and you're to half of that, which is still... unfortunately several orders of magnitude larger than the expected heat death of our universe.

    You will need a targetted attack on the digest algorithm you chose, exploiting known weaknesses. SHA-1 has been "broken" in terms of having targetted collisions being made, but most of the other SHA-x implementations are still safe, relatively speaking.

    Now, I picked "billion billion billion" as a rather generous estimate of how many digests you can computer in a second. I have no fricking idea if this is accurate or not but the result with 273 zeroes should tell you that even if I was off by an order of magnitude or ten, you're still far from there.

    According to this answer from 2012, current computers (back then) could compute 100 million hashes per second of SHA512. If we apply Moore's Law to this and just double the performance every 18 months which I believe was the norm, we could potentially calculate 100 million * 2^(6) [=64] hashes today, which is roughly 6.4 billion hashes per second. We're still a billion billion hashes away in speed magnitude to reach our target (which still has 273 zeroes behind it).