I want to calculate a MD5 Hash for an R Object. This is usually done with the serialized object. I am aware of two differect R libs that can calculate MD5 hashes - the digest library and the openssl library. But these two return different hash values. Here is an example fore the openssl library:
test <- 1:100
library(openssl )
md5(serialize(test, connection = NULL))
# returns: md5 23:a8:b3:40:9e:08:a0:3d:30:6e:3d:3d:cb:fe:21:57
Now the example for the digest library:
library(digest)
digest(test,"md5",serialize = T)
# returns: [1] "83777773fa047247723ad5a255963144"
digest
skips some leading bits if the object is serialized.
For example:
> .t <- serialize(test, connection = NULL)
> md5(.t[seq(15, length(.t))])
md5 83:77:77:73:fa:04:72:47:72:3a:d5:a2:55:96:31:44
The result of serialize(1:100, connection = NULL)
is different if the R version is different.
According to the source code of base::serialize
, R writes some integers which represent the R version during the serialization.
digest::digest
skips these bits before calculating md5sum, so the result will be consistent.