Search code examples
google-bigquerylegacy-sql

Google Bigquery Legacy SQL - How to return a null or zero if no results returned?


Say I had the following table

month    region   revenue   
------  -------- ---------- 
 jan     north      100
 feb     north      150
 mar     north      250

How would I be able to query the above table to get the following results?:

month    region   revenue   
------  -------- ---------- 
 jan     north      100
 feb     north      150
 mar     north      250
 apr     north       0
 may     north       0
 jun     north       0

0's can be null's or vice versa. Essentially trying to add empty/null fields into my query (in this case the apr, may, jun rows). Any help would be much appreciated

Thanks


Solution

  • Below is for BigQuery Legacy SQL, but please be advised - it is strongly recommended bu BigQuery Team to migrate to BigQuery Standard SQL

    Below example should give you an idea

    #legacySQL
    SELECT 
      months.month_abr AS month_abr, 
      regions.region AS region, 
      COALESCE(revenues.revenue, 0) revenue
    FROM months
    CROSS JOIN (
      SELECT region FROM revenues
    ) regions
    LEFT JOIN revenues
    ON months.month_abr = revenues.month_abr
    AND regions.region = revenues.region
    -- ORDER BY regions.region, months.month_number
    

    where revenues is your original table with revenues data, month is a table (or you can use subquery as in below example) with list of month

    You can test / play with above using below example with dummy data from your question

    #legacySQL
    SELECT 
      months.month_abr AS month_abr, 
      regions.region AS region, 
      COALESCE(revenues.revenue, 0) revenue
    FROM (
      SELECT month_number, month_abr FROM 
      (SELECT 1 month_number, 'jan' month_abr),
      (SELECT 2 month_number, 'feb' month_abr),
      (SELECT 3 month_number, 'mar' month_abr),
      (SELECT 4 month_number, 'apr' month_abr),
      (SELECT 5 month_number, 'may' month_abr),
      (SELECT 6 month_number, 'jun' month_abr)  
    ) AS months
    CROSS JOIN (
      SELECT region FROM (
        SELECT region FROM 
        (SELECT 'jan' month_abr, 'north' region, 100 revenue),
        (SELECT 'feb' month_abr, 'north' region, 150 revenue),
        (SELECT 'mar' month_abr, 'north' region, 250 revenue)
      ) GROUP BY region
    ) regions
    LEFT JOIN (
      SELECT month_abr, region, revenue FROM 
      (SELECT 'jan' month_abr, 'north' region, 100 revenue),
      (SELECT 'feb' month_abr, 'north' region, 150 revenue),
      (SELECT 'mar' month_abr, 'north' region, 250 revenue)
    ) AS revenues
    ON months.month_abr = revenues.month_abr
    AND regions.region = revenues.region
    ORDER BY regions.region, months.month_number
    

    with result as below

    Row month_abr   region  revenue  
    1   jan         north   100  
    2   feb         north   150  
    3   mar         north   250  
    4   apr         north   0    
    5   may         north   0    
    6   jun         north   0    
    

    And finally - below is how same can look like for BigQuery Standard SQL

    #standardSQL
    WITH regions AS (
      SELECT DISTINCT region FROM revenues
    ), months AS (
    SELECT EXTRACT(MONTH FROM month) month_number,
      LOWER(FORMAT_DATE('%b', month)) month_abr
      FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2010-01-01', DATE '2010-12-01', INTERVAL 1 MONTH)) month
    )
    SELECT month_abr, region, COALESCE(revenues.revenue, 0) revenue
    FROM months
    CROSS JOIN regions
    LEFT JOIN revenues
    USING(month_abr, region)
    ORDER BY region, month_number
    

    You can be test, play with this using dummy data from your questions

    #standardSQL
    WITH revenues AS (
      SELECT 'jan' month_abr, 'north' region, 100 revenue UNION ALL
      SELECT 'feb', 'north', 150 UNION ALL
      SELECT 'mar', 'north', 250 
    ), regions AS (
      SELECT DISTINCT region FROM revenues
    ), months AS (
    SELECT EXTRACT(MONTH FROM month) month_number,
      LOWER(FORMAT_DATE('%b', month)) month_abr
      FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2010-01-01', DATE '2010-12-01', INTERVAL 1 MONTH)) month
    )
    SELECT month_abr, region, COALESCE(revenues.revenue, 0) revenue
    FROM months
    CROSS JOIN regions
    LEFT JOIN revenues
    USING(month_abr, region)
    ORDER BY region, month_number
    

    You should be able to apply above to your real use case