Search code examples
mysqldynamichashmd5information-schema

Dynamic row hash based on information_schema for MySQL


We process a large amount of data on a daily basis & one way we reduce processing time is by skipping any records that have not changed from the previous day. We do this by calculating an MD5 hash across all the columns we want to detect changes in & compare it to yesterdays row hash. The problem is that every time we add a column to the load we need to remember to add it to the row hash calculation. This does not always happen & is unnecessary duplicated work. To make this automatic i wanted to use the information schema to build my list of columns dynamically. I spent longer than i would like to admit searching for an answer and eventually building my own. Hopefully this helps someone else.


Solution

  • We need to set a variable with the dynamic sql so it needs to be done using a procedure. Replace the table name & the excluded columns from the WHERE clause and you should be good to go. This version updates a column called "rowHash" in the same table we are calculating the row hash for but can be modified for other needs.

    DROP PROCEDURE IF EXISTS sp_set_dynamic_rowhash;
    CREATE PROCEDURE sp_set_dynamic_rowhash()
        SQL SECURITY INVOKER
    BEGIN
    
    SET @sql = (SELECT CONCAT('UPDATE TableName SET rowHash = MD5(CONCAT(',GROUP_CONCAT('IFNULL(LTRIM(RTRIM(',column_name,')),"")'), '))')
                FROM information_schema.columns 
                WHERE table_name = 'TableName'
                AND table_schema = 'TableSchema'
                AND column_name NOT IN ('excluded_col_1','excluded_col_2','excluded_col_3', 'Rowhash')
               );
             
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    
    END;