Search code examples
listgoogle-bigquerycomparison

BigQuery compare two lists


I would like to compare two lists of strings using BigQuery.

DECLARE list ARRAY ; DECLARE list2 ARRAY ;

SET list1 = ['orange','red','black' ]; SET list2 = ['red','black' ];

AS a result, i would like to get 'orange' as it is the value missing in list2.

Can anyone help me please ?

I've tried this solution, but it takes too long. My lists actually contain hundreds of values, so I'd like something more efficient than adding 'as value union all select' in front of each value.

WITH list1 AS (
  SELECT 'orange' AS value UNION ALL
  SELECT 'red' AS value UNION ALL
  SELECT 'black' AS value
)
,
list2 AS (
  SELECT 'red' AS value UNION ALL
  SELECT 'black' AS value 
)

SELECT value
FROM (
  SELECT value, 'list1' AS list_name
  FROM list1
  UNION ALL
  SELECT value, 'list2' AS list_name
  FROM list2
)
GROUP BY value
HAVING COUNT(*) = 1
ORDER BY value

Solution

  • Consider below (BigQuery Standard SQL)

    DECLARE list1, list2 ARRAY<string>;
    SET list1 = ['orange','red','black' ]; SET list2 = ['red','black' ];
    SELECT * FROM (
      SELECT DISTINCT el FROM UNNEST(list1) el
      UNION ALL
      SELECT DISTINCT el FROM UNNEST(list2) el
    )
    GROUP BY el
    HAVING COUNT(*) = 1;    
    

    with output

    enter image description here

    Also consider below approach

    DECLARE list1, list2 ARRAY<string>;
    SET list1 = ['orange','red','black', 'green' ]; SET list2 = ['red','black', 'pink' ];
    SELECT list, ARRAY_AGG(el) el FROM (
      SELECT el, MIN(list) list FROM (
        SELECT DISTINCT el, 'list1' list FROM UNNEST(list1) el
        UNION ALL
        SELECT DISTINCT el, 'list2' FROM UNNEST(list2) el
      )
      GROUP BY el
      HAVING COUNT(*) = 1
    )
    GROUP BY list
    

    with output

    enter image description here