I am trying to join two tables where values from the first table need to be included in values in one field from the second table but excluded from values in a different field. All fields are non-array strings.
As an example, I have table1 that I wanted joined to table2 on the conditions where table1.field1 is in table2.field1yes but not in table2.field1no.
table1 has rows that should join with rows in table2 as follows:
Where _ represents a single character wildcard and % represents 0 or more character wildcard
Written in code, I've tried to do the following to join the tables in the example above:
WITH
table1 AS (SELECT '123100' field1 UNION ALL
SELECT '321300'),
table2 AS (SELECT 'data1' data, 'r\'^([\\d]+100|[\\d]+200)$\'' field1yes, '' field1no UNION ALL
SELECT 'data2', 'r\'^([0-9A-Za-z])$\'', 'r\'^([\\d]+100|[\\d]+200)$\'')
SELECT
*
FROM table1
JOIN table2
ON 0=0
AND REGEXP_CONTAINS(table1.field1, table2.field1yes)
AND NOT REGEXP_CONTAINS(table1.field1, table2.field1no)
But it returns no results, even though the following does returns the first row from table1 a result as expected:
SELECT
*
FROM table1
WHERE 0=0
AND REGEXP_CONTAINS(table1.field1, r'^([\d]+100|[\d]+200)$')
and the following returns the second row from table1:
SELECT
*
FROM table1
WHERE 0=0
AND NOT REGEXP_CONTAINS(table1.field1, r'^([\d]+100|[\d]+200)$')
Is this the right approach? If so, what tweaks do I need to make to my query to return results? If not, is there a different solution to match these two tables?
Below is for BigQuery Standard SQL
Is this the right approach?
Yes, you can preset your regexp and then use it in REGEXP functions
what tweaks do I need to make to my query to return results?
I hope below makes it very clear what was "wrong" in your original query. I mostly mean within TCE for table2 (specifically - field1yes and field1no)
#standardSQL
WITH table1 AS (
SELECT '123100' field1 UNION ALL
SELECT '321300'
), table2 AS (
SELECT 'data1' data, r'^([\d]+100|[\d]+200)$' field1yes, '' field1no UNION ALL
SELECT 'data2', r'^([0-9A-Za-z])$', r'^([\d]+100|[\d]+200)$'
)
SELECT *
FROM table1
JOIN table2
ON REGEXP_CONTAINS(table1.field1, table2.field1yes)
OR NOT REGEXP_CONTAINS(table1.field1, table2.field1no)
Note: I focused on fixing the main issue with your initial query which was a representation of regexp expressions and assumed that the rest (logic of joining, etc.) is up to you to tune