Search code examples
sqlgoogle-bigquerywindow-functionsaggregationrolling-computation

Bigquery, find rolling latest value of groups and find minimum of group


Assume we have a product, seller, price and stock count change data in bigquery, like below. This data comes from changes (CDC) of a product listing table.

There may be more than one seller for a product, and different sellers have different price offerings for that product. And there are times when sellers go out of stock, we omit that seller from minimum calculation. So minimum price for that product at that time depends on sellers with stock.


WITH a AS
(
  SELECT 1 AS dt, 'p' as product_id, 'A' AS seller, 10 as stock, 100 as price
  UNION ALL
  SELECT 2 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 120 as price
  UNION ALL
  SELECT 3 AS dt, 'p' as product_id, 'C' AS seller, 10 as stock, 150 as price
  UNION ALL
  SELECT 4 AS dt, 'p' as product_id, 'D' AS seller, 10 as stock, 300 as price
  UNION ALL
  SELECT 5 AS dt, 'p' as product_id, 'E' AS seller, 10 as stock, 400 as price
  UNION ALL
  SELECT 6 AS dt, 'p' as product_id, 'F' AS seller, 10 as stock, 500 as price
  UNION ALL
  SELECT 7 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 600 as price
  UNION ALL
  SELECT 8 AS dt, 'p' as product_id, 'A' AS seller, 0 as stock, 100 as price
  UNION ALL
  SELECT 9 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 110 as price
  UNION ALL
  SELECT 10 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 190 as price
  UNION ALL
  SELECT 11 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 800 as price
  UNION ALL
  SELECT 12 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 100 as price
)

SELECT *
FROM a

I want to calculate minimum price and seller with minimum price of this product at every time t like:

Desired Output:

dt product_id minimum_price seller_with_minimum_price
1 p 100 A
2 p 100 A
3 p 100 A
4 p 100 A
5 p 100 A
6 p 100 A
7 p 100 A
8 p 120 B
9 p 110 B
10 p 150 C
11 p 150 C
12 p 100 G

At time 1, there is one seller only for that product. So the minimum price is 100 and seller with minimum price is A. At time 2, there comes a second seller but the minimum price is still 100 from seller A. Until time 7, the status is the same.

At time 8, seller A is out of stock, so the minimum price for this product is 120 with seller B. At time 9, seller B decreases the price, so min price is 110 with seller B. At time 10, B goes out of stock. At time 11, G raises its price, so it has no effect. At time 12, seller G lowers its price. So minimum price (for seller with stock in hand) at time 12 is 100 and seller is G.

In short, I want to find the minimum price of a product at different times for sellers who have stock.

To calculate this logic, I have found a solution which includes cross-join, but it takes too long and too much resource. I'd like to know if there is a better solution for this. I've searched stackoverflow + google, but could not find a good solution.


Solution

  • Use below approach

    WITH previous_prices AS (
      SELECT *, ARRAY_AGG(STRUCT(dt, seller, stock, price)) OVER win prev
      FROM your_data  
      WINDOW win AS (PARTITION BY product_id ORDER BY dt)
    ), last_qualifying_prices AS (
      SELECT dt, product_id, 
        ARRAY(
          SELECT AS STRUCT price, seller
          FROM t.prev 
          QUALIFY ROW_NUMBER() OVER win = 1 AND stock > 0
          WINDOW win AS (PARTITION BY seller ORDER BY dt DESC)
        ) candidates
      FROM previous_prices t
    )
    SELECT dt, product_id, candidate.*
    FROM last_qualifying_prices, 
    UNNEST(candidates) candidate
    QUALIFY ROW_NUMBER() OVER win = 1
    WINDOW win AS (PARTITION BY dt, product_id ORDER BY candidate.price)
    ORDER BY dt    
    

    if applied to sample data in your question - output is

    enter image description here