Search code examples
ruby-on-railsrubychecksumrails-activestorage

Rails active_storage: Check if an attachment is the same as a given file


Given the following situation:

  • document source of type PDF freshly received via a file transfer
  • document target which is saved as an attachment blob using active_storage and versionned

I want to check whether any existing version of target is binary-equal to source. Without active_storage, I'd have done a SHA256 sum of any blob that enters the DB. In order to compare, I'd have compared the fresh SHA256 sum of source to each checksum stored for any version of target.

However the method .checksum of active_storage attachments and blobs appears to be neither a MD5 or SHA265 sum. For instance I get Cr4IxYNF7v7cJao1EiiBEw== for some file.

A solution would be to use something like Digest::SHA256.hexdigest(Person.find(46).photo.download) however the performance would be terrible.

How can I efficiently search my active_storage "database" ?


Solution

  • According to the ActiveStorage source, the checksum is in fact MD5. But it has been base64 encoded.

    From the source at: https://github.com/rails/rails/blob/8da6ba9cae21beae1ee3c379db7b7113d2731c9b/activestorage/app/models/active_storage/blob.rb#L313

    def compute_checksum_in_chunks(io)
      Digest::MD5.new.tap do |checksum|
        while chunk = io.read(5.megabytes)
          checksum << chunk
        end
    
        io.rewind
      end.base64digest
    end
    

    So hopefully you should be able to just base64 encode your own MD5 hashes for a comparison in the database.