In Bigquery can I join two tables using Contains/Not Contains with wildcards?

I am trying to join two tables where values from the first table need to be included in values in one field from the second table but excluded from values in a different field. All fields are non-array strings.

As an example, I have table1 that I wanted joined to table2 on the conditions where table1.field1 is in table2.field1yes but not in table2.field1no.

table1 has rows that should join with rows in table2 as follows:

table1.field1 = '123100'
- table2.field1yes = '___100,___200'
- table2.field2no = Null
table1.field1 = '321300'
- table2.field1yes = %
- table2.field1no = '___100,___200'

Where _ represents a single character wildcard and % represents 0 or more character wildcard

Written in code, I've tried to do the following to join the tables in the example above:

WITH
  table1  AS (SELECT '123100' field1 UNION ALL
              SELECT '321300'),
  table2  AS (SELECT 'data1' data, 'r\'^([\\d]+100|[\\d]+200)$\'' field1yes, '' field1no UNION ALL
              SELECT 'data2',      'r\'^([0-9A-Za-z])$\'',                   'r\'^([\\d]+100|[\\d]+200)$\'')

SELECT
  *
FROM table1
JOIN table2
ON 0=0
  AND     REGEXP_CONTAINS(table1.field1, table2.field1yes)
  AND NOT REGEXP_CONTAINS(table1.field1, table2.field1no)

But it returns no results, even though the following does returns the first row from table1 a result as expected:

SELECT
  *
FROM table1
WHERE 0=0
  AND     REGEXP_CONTAINS(table1.field1, r'^([\d]+100|[\d]+200)$')

and the following returns the second row from table1:

SELECT
  *
FROM table1
WHERE 0=0
  AND NOT REGEXP_CONTAINS(table1.field1, r'^([\d]+100|[\d]+200)$')

Is this the right approach? If so, what tweaks do I need to make to my query to return results? If not, is there a different solution to match these two tables?

Solution

Below is for BigQuery Standard SQL

Is this the right approach?

Yes, you can preset your regexp and then use it in REGEXP functions

what tweaks do I need to make to my query to return results?

I hope below makes it very clear what was "wrong" in your original query. I mostly mean within TCE for table2 (specifically - field1yes and field1no)

#standardSQL
WITH table1  AS (
  SELECT '123100' field1 UNION ALL
  SELECT '321300'
), table2  AS (
  SELECT 'data1' data, r'^([\d]+100|[\d]+200)$' field1yes, '' field1no UNION ALL
  SELECT 'data2',      r'^([0-9A-Za-z])$',                   r'^([\d]+100|[\d]+200)$'
)
SELECT *
FROM table1
JOIN table2
ON REGEXP_CONTAINS(table1.field1, table2.field1yes)
OR NOT REGEXP_CONTAINS(table1.field1, table2.field1no)

Note: I focused on fixing the main issue with your initial query which was a representation of regexp expressions and assumed that the rest (logic of joining, etc.) is up to you to tune