Search code examples
mysqlfunctionstored-proceduresmariadbexcept

Alternative for EXCEPT for MariaDB/MySQL comparing all columns


I know MariaDB and MySQL don't support EXCEPT. I would like to find an alternative to something like this:

SELECT * FROM table
EXCEPT
SELECT * FROM backup_table

Where the table and backup_table have the same schema.

All the posts I've seen suggests that I compare a single column using "WHERE column IN (...)". The problem in my case is that I need to compare all the columns between the two tables for every table. I'm hoping to write this as procedure or function looping through all the tables, looking for any changes in the database. Basically, I want to find out all the records that have been updated or inserted in all my tables.


Solution

  • If I was faced with that task, I'd use an anti-join pattern. That's an outer join, to return all rows from the current table, along with "matching" rows from the backup table. Then in the WHERE clause, we exclude all rows that had an exact match. Returning rows that don't match.

      SELECT t.*
        FROM mytable t
        LEFT
        JOIN backup_mytable s
          ON s.id        <=> t.id
         AND s.col_two   <=> t.col_two
         AND s.col_three <=> t.col_three
         AND ... 
     WHERE s.id IS NULL
    

    This assumes that the column id is guaranteed to be non-NULL. The PRIMARY KEY column (or any column that is part of the PRIMARY KEY of the table, or any column that has a NOT NULL constraint would serve.)

    This query only returns the rows that don't match a row in backup table. It doesn't indicate whether its row that doesn't exist, or whether a value of a column was changed.

    And to get rows in the original table that don't match rows in the backup table, just swap the table names.

    For the special case of a table with all columns defined as NOT NULL, we could take a shortcut on the join predicates.

        FROM mytable t
     NATURAL
        LEFT
        JOIN backup_mytable s
       WHERE s.id IS NULL
    

    That's equivalent to a LEFT JOIN with a USING clause of all columns that are named the same in both tables.

        FROM mytable t
        LEFT
        JOIN backup_mytable s
       USING (id, col_two, col_three, ...)
      WHERE s.id IS NULL
    

    That's equivalent to specifying an equality comparison on every column (if both tables have the same columns)

        FROM mytable t
        LEFT
        JOIN backup_mytable s
          ON s.id        = t.id
         AND s.col_two   = t.col_two
         AND s.col_three = t.col_three
    

    Any occurrences of NULL values in any of the columns are going to screw with the equality comparison, and return NULL.

    And that's why the first query uses the null-safe comparison <=> (spaceship) operator. NULL <=> NULL will return TRUE, where NULL = NULL will return NULL.

    For that first query pattern, rather than tediously typing out all of those comparisons of every column, I would use SQL to help me generate the SQL I need.

     SELECT CONCAT('   AND s.`',c.column_name,'` <=> t.`',c.column_name,'`') AS `-- stmt`
       FROM information_schema.columns c
      WHERE c.table_schema = 'mydatabase'
        AND c.table_name = 'mytable'
      ORDER BY c.ordinal_position
    

    I'd take the rows returned by that query, and paste that in

    SELECT t.*
      FROM ... t
      JOIN ... s
        ON 1=1
        -- paste here --
     WHERE s.id IS NULL
    ORDER BY t.id
    

    If I needed query that matched on just the id column, and needed to identify which columns had changed, I'd use expressions in the SELECT list. For example:

     SELECT s.`id`        <=> t.`id`         AS `match_id`
          , s.`col_one`   <=> t.`col_one`    AS `match_col_one`
          , s.`col_three` <=> t.`col_three`  AS `match_col_three`
      FROM mytable t
      JOIN backup_mytable s
        ON s.id = t.id
    HAVING NOT match_col_one
    

    Here referencing the column alias in the SELECT list in a HAVING clause, to exclude rows that have the same value of col_one; returning rows where col_one is different.

    Again, I would use SQL against information_schema.columns to help speed up the query writing process.