Search code examples
sqljoingoogle-bigquerycoalesce

Dynamic conditional joins in BigQuery


I'm trying to write perhaps a dynamic conditional statement in BigQuery to dynamically join tables based on certain conditions. It's possible to have 1 or many conditions for a single ID. How do I dynamically 'filter' a series of ID conditions together (using Table 1, 2 and 3) to get to a set of masterProductIds in to my Final table?

Table 1 - identification_table

Table 1 - identification_table

Table 2 - filing_table

Table 2 - filing_table

Table 3 - Price

Table 3 - Price

In this situation we have 3 conditions to be evaluated for Id 1, 2 conditions for Id 2 and 1 condition for Id 3.

In a situation where we have a price condition as well, it should join to the price table and filter depending on the operator and value in table 1. As for product condition, I don't have to join on any tables, I just take the value as is in table 1.

What I'm expecting as my final result set.

Final Table

Final Table

What I've done as of now:

select masterProductId, row_number() over (partition by id ) sq
from `filing_table` p 
left join `identification_table` pc
on case when subject = 'brand' then p.brandName when subject='category' then categoryName end = pc.boundaryValue
--on p.brandName = pc.boundaryValue or p.categoryName = boundaryValue
left join `price` pp
on p.code = pp.code
where 1=1
and pc.code = 'Id 2'
--and pp.RRP < safe_cast(pc1.boundaryValue as float64)

This would evaluate everything as a whole. I can't figure how to evaluate the IDs in a set by set basis.


Solution

  • Below is for BigQuery Standard SQL and assumes tables to be set as below (with agreement from the OP in the comments to question)

    identification_table

      SELECT 'Id 1' id, 'masterProductId' subject, '=' operator, '1007' value UNION ALL
      SELECT 'Id 1', 'brandName', '=', 'brand p' UNION ALL
      SELECT 'Id 1', 'categoryName', '=', 'category 1' UNION ALL
      SELECT 'Id 2', 'categoryName', '=', 'category 1' UNION ALL
      SELECT 'Id 2', 'price', '<', '130'  UNION ALL
      SELECT 'Id 3', 'categoryName', '=', 'category 3' 
    

    filing_table

      SELECT 11 code, 'category 1' categoryName, 'brand p' brandName, 1001 masterProductId UNION ALL
      SELECT 22, 'category 1', 'brand z', 1002 UNION ALL
      SELECT 33, 'category 2', 'brand c', 1003 UNION ALL
      SELECT 44, 'category 2', 'brand v', 1004 UNION ALL
      SELECT 55, 'category 3', 'brand e', 1005
    

    price

      SELECT 11 code, 3 price UNION ALL
      SELECT 22, 100 UNION ALL
      SELECT 33, 8 UNION ALL
      SELECT 44, 9 UNION ALL
      SELECT 77, 28 
    

    So, below extracts from filing_table those masterProductId's which qualify based on all criteria from identification_table

    EXECUTE IMMEDIATE '''
    SELECT masterProductId
    FROM (
      SELECT f.*, price
      FROM `filing_table` f
      LEFT JOIN `price` p
      USING(code)
    )
    WHERE ''' || (  
      SELECT STRING_AGG('(' || condition || ')', ' OR ')
      FROM (
        SELECT STRING_AGG(FORMAT('(%s %s %s)', subject, operator, value), ' AND ') condition
        FROM `identification_table`,
        UNNEST([IF(subject IN ('price', 'masterProductId'), value, '"' || value || '"')]) value
        GROUP BY id
      ));
    

    If to apply to sample data as in top of the answer - output is

    Row masterProductId  
    1   1001     
    2   1002     
    3   1005