Search code examples
sqlpostgresqlgoogle-bigquerywindow-functions

RANGE clause for window function to get the previous sum


I have the following data, which I am trying to get the profit from the previous year:

WITH tbl (year, country, product, profit) AS (
   VALUES
     (2000, 'Finland', 'Computer'  , 1500)
   , (2000, 'Finland', 'Phone'     ,  100)
   , (2001, 'Finland', 'Phone'     ,   10)
   , (2000, 'India'  , 'Calculator',   75)
   , (2000, 'India'  , 'Calculator',   75)
   , (2000, 'India'  , 'Computer'  , 1200)
   )
select country, year, profit
     , lag(profit) over (partition by country order by year)
from tbl;

┌─────────┬──────┬────────┬──────────────────────────┐
│ country ┆ year ┆ profit ┆ sum_profit_previous_year │
╞═════════╪══════╪════════╪══════════════════════════╡
│ India   ┆ 2000 ┆     75 ┆                          │
│ India   ┆ 2000 ┆     75 ┆                       75 │
│ India   ┆ 2000 ┆   1200 ┆                       75 │
│ Finland ┆ 2000 ┆   1500 ┆                          │
│ Finland ┆ 2000 ┆    100 ┆                     1500 │
│ Finland ┆ 2001 ┆     10 ┆                      100 │
└─────────┴──────┴────────┴──────────────────────────┘

However, this just seems to get the previous row, rather than what I want, which is to get the LAG of the profit value for the previous year for that country. The expected result should be:

┌─────────┬──────┬────────┬──────────────────────────┐
│ country ┆ year ┆ profit ┆ sum_profit_previous_year |
╞═════════╪══════╪════════╪══════════════════════════╡
│ India   ┆ 2000 ┆     75 ┆                          │
│ India   ┆ 2000 ┆     75 ┆                          │
│ India   ┆ 2000 ┆   1200 ┆                          │
│ Finland ┆ 2000 ┆   1500 ┆                          │
│ Finland ┆ 2000 ┆    100 ┆                          │
│ Finland ┆ 2001 ┆     10 ┆                   1600   │
└─────────┴──────┴────────┴──────────────────────────┘

Since Finland-2001 is the only record that also has a record for that same country for the previous year. What would be the correct RANGE clause here to accomplish this? (Either BigQuery or Postgres is fine for testing purposes).


Solution

  • If you indeed want all un-aggregated rows, and add the aggregated profit for the previous year ...

    Here is a way with a plain window function and a custom window frame:

    SELECT country, year, profit
         , sum(profit) OVER (PARTITION BY country ORDER BY year
                             RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING) AS sum_profit_previous_year
    FROM   tbl
    ORDER  BY country DESC, year;  -- optional?
    

    fiddle

    Details in the manual here.