Given the following situation:
source
of type PDF freshly received via a file transfertarget
which is saved as an attachment blob using active_storage
and versionnedI want to check whether any existing version of target
is binary-equal to source
. Without active_storage
, I'd have done a SHA256 sum of any blob that enters the DB. In order to compare, I'd have compared the fresh SHA256 sum of source
to each checksum stored for any version of target
.
However the method .checksum
of active_storage
attachments and blobs appears to be neither a MD5 or SHA265 sum. For instance I get Cr4IxYNF7v7cJao1EiiBEw==
for some file.
A solution would be to use something like Digest::SHA256.hexdigest(Person.find(46).photo.download)
however the performance would be terrible.
How can I efficiently search my active_storage
"database" ?
According to the ActiveStorage source, the checksum is in fact MD5. But it has been base64 encoded.
From the source at: https://github.com/rails/rails/blob/8da6ba9cae21beae1ee3c379db7b7113d2731c9b/activestorage/app/models/active_storage/blob.rb#L313
def compute_checksum_in_chunks(io)
Digest::MD5.new.tap do |checksum|
while chunk = io.read(5.megabytes)
checksum << chunk
end
io.rewind
end.base64digest
end
So hopefully you should be able to just base64 encode your own MD5 hashes for a comparison in the database.