Search code examples
sqlpostgresqlgoogle-bigquery

EXCEPT showing additional fields


Let's say I have two tables (from uploaded csv files) and I want to do a diff based on an id+territory in the new file that wasn't in the old file. The easiest way to do this is something like:

SELECT id, territory FROM this_week EXCEPT SELECT id, territory FROM last_week

However, what I'm trying to do is get ALL fields (in both tables -- one row per key) that are generated by that difference. How could this be done?

Either postgres or bigquery is fine. Both have the EXCEPT set op.


An example with data from Erwin's answer:

WITH this_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(22,'us','spider','hd'),(3,'fr','new','hd')),
     last_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(2,'us','spider','hd'))
SELECT *  -- all columns of "this_week"
FROM   this_week t
WHERE  NOT EXISTS (
   SELECT * FROM last_week l
   WHERE  t.id = l.id
   AND    t.territory = l.territory
   );

Solution

  • Use NOT EXISTS:

    SELECT *  -- all columns of "this_week"
    FROM   this_week t
    WHERE  NOT EXISTS (
       SELECT FROM last_week l
       WHERE  t.id = l.id
       AND    t.territory = l.territory
       );
    

    And index on last_week (id, territory) would typically help performance (a lot).

    This shows all columns of this_week.
    I don't see the point in adding columns of last_week, which would be empty (null) by definition of the query if you'd left-join.

    Basics:

    Note a subtle difference:

    EXCEPT (when used without ALL) folds duplicates. This query does not. You may want one or the other. Typically, you want this.