Search code examples
google-bigquerywindow-functionsgaps-and-islands

Sum Consecutive Months Based on Groups with Criteria where Only Certain Months Show


Building off a question I asked before "Sum Consecutive Months Based on Groups with Criteria". The main difference between these two is how would I do this when not all months are present and only the months with the criteria is shown? Basically a gaps and islands question where only the "islands" are in the data and the "gaps" are not

Example data

+--------+-----------+------------+
| Fruit  | SaleDate  | Top_Region |
+--------+-----------+------------+
| Apple  | 1/1/2017  |          1 |
| Apple  | 2/1/2017  |          1 |
| Apple  | 3/1/2017  |          1 |
| Apple  | 7/1/2017  |          1 |
| Apple  | 8/1/2017  |          1 |
| Apple  | 9/1/2017  |          1 |
| Apple  | 10/1/2017 |          1 |
| Banana | 3/1/2017  |          1 |
| Banana | 4/1/2017  |          1 |
| Banana | 5/1/2017  |          1 |
| Banana | 6/1/2017  |          1 |
| Banana | 7/1/2017  |          1 |
| Banana | 8/1/2017  |          1 |
| Banana | 10/1/2017 |          1 |
| Banana | 11/1/2017 |          1 |
+--------+-----------+------------+

Expected Output

+--------+-----------+-----------+-------+
| Fruit  |   Start   |    End    | Total |
+--------+-----------+-----------+-------+
| Apple  | 1/1/2017  | 3/1/2017  |     3 |
| Apple  | 7/1/2017  | 10/1/2017 |     4 |
| Banana | 3/1/2017  | 8/1/2017  |     6 |
| Banana | 10/1/2017 | 11/1/2017 |     2 |
+--------+-----------+-----------+-------+

The original solution does not work because the "gaps" are not present just the "islands"

I think this solution works, but am not sure if there is a better way to do it than this

--get previous date
WITH CTE1 as (
select t.*
  , lag(sale_date) over (partition by fruit order by sale_date asc) as prev_date
from mytable t
)
,
--see if the previous and current date are consecutive
CTE2 as (
  SELECT a.*
  , CASE WHEN sale_date!=DATE_ADD(prev_date, INTERVAL 1 MONTH) or prev_date is null then 1 else 0 end  as test
  FROM CTE1
)
,
--create groups base on the consecutive test
CTE3 as (
  SELECT b.*, SUM(test) OVER (partition by fruit ORDER BY sale_date) as groups
  FROM CTE2
)
--get rid of the non conseutive dates and need to add 1 to make sure all months are counts
SELECT fruit, groups, min(prev_date), max(sale_date), count(*)+1 as months
FROM CTE3
where test!=1
GROUP BY fruit, groups
order by 1

For example I need to add 1 to the count at the end to get the correct number of months. Is there a better more eloquent way to do this?


Solution

  • Consider below (BigQuery Standard SQL)

    select 
        fruit,
        min(saledate) start_date,
        max(saledate) end_date,
        count(*) total
    from (
        select 
            t.*,
            date_diff(SaleDate, min(SaleDate) over(partition by Fruit order by SaleDate), day) rn1,
            row_number() over(partition by fruit, Top_Region order by saledate) rn2
        from your_table t
    ) t
    where top_region = 1
    group by fruit, rn1 - rn2
    order by fruit, start_date    
    

    with output

    enter image description here