Search code examples
mysqlduplicateschecksum

Checksum option in MYSQL to assign to each field


I've seen examples using crc, but the examples I've got my head round work with generating a checksum for entire table data (for use with replication and data validation). For example:

SELECT  crc
FROM
(
  SELECT @r := MD5(CONCAT(property_id, asking_price, postcode, 
address_city, @r)) AS crc,
     @c := @c + 1 AS cnt
  FROM
(
SELECT @r := '', @c := 0
) rc,
(
SELECT property_id, asking_price, postcode, address_city
FROM table
WHERE property_type_id = 2
AND listing_type = 'BUY'
ORDER BY
  address_postalcode
) k
) ci
WHERE cnt = @c

I'm trying to assign a checksum value to each entry in a MYSQL database, based on a series of field values (shown above). The idea being that when a specific set of fields have the same data in, the checksum generated would be the same and I can deal with duplicate entries by checking the checksum value, instead of lengthy checks of the field to see if they contain matching data. I've seen this in SQL SERVER, using checksum_binary, which is so fast but is there a better solution to apply a checksum value to individual fields for comparison or should I stick with trying to modify the above?


Solution

  • I don't understand why you are doing something this complicated :

    A checksum for each rows with 3 columns :

    SELECT id, MD5(concat(COALESCE(col1, "0"), COALESCE(col2, "0"), COALESCE(col3, "0"))) 
    AS md5_checksum FROM tbl_name;
    

    To insert this new checksum : Alter the table to add the new column then

    UPDATE into table_name t1,
    (SELECT id, MD5(concat(COALESCE(col1, "0"), COALESCE(col2, "0"), COALESCE(col3, "0"))) 
        AS md5_checksum FROM tbl_name) t2
    SET t1.mynewchecksum = t2.md5_checksum
    WHERE t1.id = t2.id
    

    A note the checksum won't be 100% safe as if the last character of 1 column is not there but find in the next column the hash will be the same. Maybe you can add a separator in the concat to be sure :

    UPDATE into table_name t1,
        (SELECT id,
         MD5(concat_ws("-",COALESCE(col1, "0"), COALESCE(col2, "0"), COALESCE(col3, "0"))) 
            AS md5_checksum FROM tbl_name) t2
    SET t1.mynewchecksum = t2.md5_checksum
    WHERE t1.id = t2.id