Search code examples
sqlmysqlchecksum

Create an aggregate checksum of a column


I want to compute a checksum of all of the values of a column in aggregate.

In other words, I want to do some equivalent of

md5(group_concat(some_column))

The problem with this approach is:

  1. It's inefficient. It has to concat all of the values of the column as a string in some temporary storage before passing it to the md5 function
  2. group_concat has a max length of 1024, after which everything else will be truncated.

(In case you're wondering, you can ensure that the concat of the values is in a consistent order, however, as believe it or not group_concat() accepts an order by clause within it, e.g. group_concat(some_column order by some_column))

MySQL offers the nonstandard bitwise aggregate functions BIT_AND(), BIT_OR() and BIT_XOR() which I presume would be useful for this problem. The column is numeric in this case but I would be interested to know if there was a way to do it with string columns.

For this particular application, the checksum does not have to be cryptologically safe.


Solution

  • It seems like you might as well use crc32 instead of md5 if you don't care about cryptographic strength. I think this:

    select sum(crc32(some_column)) from some_table;
    

    would work on strings. It might be inefficient as perhaps MySQL would create a temporary table (especially if you added an order by).