Search code examples
sqlgoogle-bigquerybigquery-udf

Bigquery macros/repeated query parts


We use Bigquery to calculate many of our daily metrics, but we are always interested in longer term averages (7day, 14day, 28day, QTD, YTD) as well.

This is always done like this (ds: date):

AVG(metric_1d) OVER ( 
  ORDER BY ds 
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 
) AS metric_7d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 13 PRECEDING AND CURRENT ROW 
) AS metric_14d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 27 PRECEDING AND CURRENT ROW 
) AS metric_28d,
AVG(metric_1d) OVER (
  PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
  PARTITION BY EXTRACT(YEAR FROM ds)
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
  SELECT
    ... AS metric_1d
    ...

What I do not like is that basically the same code is repeated in all metrics queries (sometimes multiple times if there are multiple metrics calculated). Is there a recommended way to simplify this, maybe using some sort of macro or UDF?


Solution

  • I don't see any help with macro (rather then using scripting which will further complicate the code) or udf here. Rather I can recommend using WINDOW clause - this will address two aspects: improve readability of code and eliminate code redundancy in case of using multiple metric/analytics calculation over same windows

    so, I would re-write your code as below

    select ds, 
      avg(metric_1d) over last_7d as metric_7d,
      avg(metric_1d) over last_14d as metric_14d,
      avg(metric_1d) over last_28d as metric_28d,
      avg(metric_1d) over qtd as metric_qtd,
      avg(metric_1d) over ytd as metric_ytd,
    from your_table
    window 
      last_7d  as (order by ds rows between  6 preceding and current row),
      last_14d as (order by ds rows between 13 preceding and current row),
      last_28d as (order by ds rows between 27 preceding and current row),
      qtd as (
        partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
        order by ds rows between unbounded preceding and current row
      ),
      ytd as (partition by extract(year from ds)
        order by ds rows between unbounded preceding and current row
      )         
    

    and if you would add more metrics like sum or count for example - it would be as simple as just below

    select ds, 
      avg(metric_1d) over last_7d as metric_7d,
      sum(metric_1d) over last_7d as metric2_7d,
      count(metric_1d) over last_7d as metric3_7d,
      avg(metric_1d) over last_14d as metric_14d,
      sum(metric_1d) over last_14d as metric2_14d,
      count(metric_1d) over last_14d as metric3_14d,
      avg(metric_1d) over last_28d as metric_28d,
      sum(metric_1d) over last_28d as metric2_28d,
      count(metric_1d) over last_28d as metric3_28d,
      avg(metric_1d) over qtd as metric_qtd,
      sum(metric_1d) over qtd as metric2_qtd,
      count(metric_1d) over qtd as metric3_qtd,
      avg(metric_1d) over ytd as metric_ytd,
      sum(metric_1d) over ytd as metric2_ytd,
      count(metric_1d) over ytd as metric3_ytd,
    from your_table
    window 
      last_7d  as (order by ds rows between  6 preceding and current row),
      last_14d as (order by ds rows between 13 preceding and current row),
      last_28d as (order by ds rows between 27 preceding and current row),
      qtd as (
        partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
        order by ds rows between unbounded preceding and current row
      ),
      ytd as (partition by extract(year from ds)
        order by ds rows between unbounded preceding and current row
      )