Search code examples
sqlgoogle-bigqueryaverageaggregate-functionsmedian

Calculating median difference of a column in BigQuery - Analytic functions cannot be arguments to aggregate functions


I have a column with all numbers called id. I'd like to get the median value of the difference between ids. For example, if the ids (sorted ascendingly) are [1, 5, 30, 35], then the difference would be [4, 25, 5] and the median. would be 5.

I tried

SELECT id - LAG(id) OVER (order by id) FROM `table` ORDER BY id ASC

and this produced a column of differences. However, if I try

SELECT AVG(id - LAG(id) OVER (order by id)) FROM `table` ORDER BY id ASC

or PERCENTILE_CONT(x, 0.5) for median, then I get an error that says Analytic functions cannot be arguments to aggregate functions. How can I solve this?

Thanks!


Solution

  • You can’t mix window and aggregate functions like that. You would need to use lag() in a subquery, then aggregate in the outer query:

    select avg(id - lag_id) avg_id_diff
    from (
        select id, lag(id) over(order by id) lag_id
        from mytable 
    ) t
    

    If you are looking for the median difference rather than the average:

    select percentile_cont(id - lag_id, 0.5) over() mediam_id_diff
    from (
        select id, lag(id) over(order by id) lag_id
        from mytable 
    ) t
    limit 1
    

    percentile_cont is available as a window function only (not as an aggregate function), which is why we use over() and limit clauses.