Search code examples
rubylinuxwindowshashchecksum

Calculating a files integrity and compression ruby


What is the most sought after method of determining a files integrity after transferring over the network via ruby?

I am attempting to write software which will break up a file into chunks, calculate the checksums of the chunks, and then send the chunks of file along with checksums to a client (probably not in that order or method, but calculating of checksums will be included none the less).

I am looking for hashing functions that are native to rubies libraries and will be able to calculate in a timely matter depending on the chunk size. The files being transferred are normally around 500mb but sometimes up to or larger than 7gb, so the hashing function must be able to chew through a small number of chunks or a large number (depending on chunk size) fairly quickly.

I have looked into crc32 via Zlib and MD5 from digest, but I am wondering if there are other functions I should look into that may be faster or which of these would be the best suited for the task.

Also what is the best compression method I could possibly use in ruby that would be efficient enough for network file transfers and could possibly save some time transferring the data.


Solution

  • File Integrity

    md5 is generally the fastest hashing method, but is considered broken now from a cryptographic security standpoint.

    After md5, sha1 was quite popular (and slightly slower) until it now too is considered inadequate from a cryptographic security perspective.

    require 'digest/sha1'
    Digest::SHA1.hexdigest("hello world")
    

    We now have sha2 or sha256, which again is slightly slower than sha1, but the current king in terms of cryptographic security.

    require 'digest/sha2'
    Digest::SHA2.hexdigest("hello world")
    

    Related StackExchange question:

    https://security.stackexchange.com/questions/34488/using-md5-for-file-integrity-checks

    Even though you are using hashes only for file-integrity checking, and not for cryptographic purposes, it is still recommended you use a secure hashing function for this purpose.

    File Compression:

    You should be able to use gzip for compressing a file stream. You can compress the file first, and then send this compressed file, or you can compress the file on the fly while pushing it to the socket (this may be wasteful if you have to compress the file each time).

    See http://ruby-doc.org/stdlib-1.9.3/libdoc/zlib/rdoc/Zlib/GzipWriter.html

    To compress the file on-the-fly try

    gz = Zlib::GzipWriter.new(socket)
    gz.write 'jugemu jugemu gokou no surikire...'
    gz.close