Search code examples
sqldatabasegoogle-bigqueryrelational-databasedata-warehouse

How to check in SQL if multi columnar set is in the table (without string concatenation)


Let's assume I've 3 columns in a table with values like this:

table_1:
   A   |   B   |   C
-----------------------
  'xx' |  ''   |  'y'
  'x'  |  'y'  |  'x'
  'x'  |  'x'  |  'y'
  'x'  |  'yy' |  ''
  'x'  |  ''   |  'yy'
  'x'  |  'y'  |  'y'

I've a result set (result of an SQL SELECT statement) which I want to identify in the above table if it exists there:

[
    ('x', 'x', 'y')
    ('x', 'y', 'y')
]

This result set would match for 5 (of 6) rows in instead of the 2 from the table above if I've compared the results of simple string concatenation, e.g. I would simply compare the results of this: SELECT concat(A, B, C) FROM table_1

I could solve this problem with comparing the results of more complex string concatenation functions like this: SELECT concat('A=', A, '_', 'B=', B, '_', 'C=', C )

BUT:

  • I don't want to use any hardcoded special separator in a string concatenation like _ or =
    • because any character might be in the data
      • e.g.: somewhere in column B there might be this value: xx_C=yy
    • it's not a clean solution
  • I don't want to use string concatenation at all, because it's an ugly solution
    • it makes the "distance" between the attributes disappear
    • not general enough
  • maybe I've columns with different datatypes I don't want to convert to a STRING based column

Question:

Is it possible to solve somehow this problem without using string concatenation? Is there a simple solution for this multi column value checking problem?

I want to solve this in BiqQuery, but I'm interested in a general solution for every relational databse/datawarehouse.

Thank you!


CREATE TABLE test.table_1 (
  A STRING,
  B STRING,
  C STRING
) AS
SELECT * FROM (
  SELECT 'xx', '', 'y'
  UNION ALL
  SELECT 'x', 'y', 'x'
  UNION ALL
  SELECT 'x', 'x', 'y'
  UNION ALL
  SELECT 'x', 'yy', ''
  UNION ALL
  SELECT 'x', '', 'yy'
  UNION ALL
  SELECT 'x', 'y', 'y'
)

SELECT A, B, C
FROM test.table_1
WHERE (A, B, C) IN (    -> I need this functionality
  SELECT 'x', 'x', 'y'
  UNION ALL
  SELECT 'x', 'y', 'y'
);

Solution

  • Below is the most generic way I can think of (BigQuery Standard SQL):

    #standardSQL
    SELECT *
    FROM `project.test.table1` t
    WHERE t IN (
      SELECT t
      FROM `project.test.table2` t
    )
    

    You can test, play with above using sample data from your question as in below example

    #standardSQL
    WITH `project.test.table1` AS (
      SELECT 'xx' a, '' b, 'y' c UNION ALL
      SELECT 'x', 'y', 'x' UNION ALL
      SELECT 'x', 'x', 'y' UNION ALL
      SELECT 'x', 'yy', '' UNION ALL
      SELECT 'x', '', 'yy' UNION ALL
      SELECT 'x', 'y', 'y'
    ), `project.test.table2` AS (
      SELECT 'x' a, 'x' b, 'y' c UNION ALL
      SELECT 'x', 'y', 'y'
    )
    SELECT *
    FROM `project.test.table1` t
    WHERE t IN (
      SELECT t
      FROM `project.test.table2` t
    )
    

    with output

    Row a   b   c    
    1   x   x   y    
    2   x   y   y