Building off a question I asked before "Sum Consecutive Months Based on Groups with Criteria". The main difference between these two is how would I do this when not all months are present and only the months with the criteria is shown? Basically a gaps and islands question where only the "islands" are in the data and the "gaps" are not
Example data
+--------+-----------+------------+
| Fruit | SaleDate | Top_Region |
+--------+-----------+------------+
| Apple | 1/1/2017 | 1 |
| Apple | 2/1/2017 | 1 |
| Apple | 3/1/2017 | 1 |
| Apple | 7/1/2017 | 1 |
| Apple | 8/1/2017 | 1 |
| Apple | 9/1/2017 | 1 |
| Apple | 10/1/2017 | 1 |
| Banana | 3/1/2017 | 1 |
| Banana | 4/1/2017 | 1 |
| Banana | 5/1/2017 | 1 |
| Banana | 6/1/2017 | 1 |
| Banana | 7/1/2017 | 1 |
| Banana | 8/1/2017 | 1 |
| Banana | 10/1/2017 | 1 |
| Banana | 11/1/2017 | 1 |
+--------+-----------+------------+
Expected Output
+--------+-----------+-----------+-------+
| Fruit | Start | End | Total |
+--------+-----------+-----------+-------+
| Apple | 1/1/2017 | 3/1/2017 | 3 |
| Apple | 7/1/2017 | 10/1/2017 | 4 |
| Banana | 3/1/2017 | 8/1/2017 | 6 |
| Banana | 10/1/2017 | 11/1/2017 | 2 |
+--------+-----------+-----------+-------+
The original solution does not work because the "gaps" are not present just the "islands"
I think this solution works, but am not sure if there is a better way to do it than this
--get previous date
WITH CTE1 as (
select t.*
, lag(sale_date) over (partition by fruit order by sale_date asc) as prev_date
from mytable t
)
,
--see if the previous and current date are consecutive
CTE2 as (
SELECT a.*
, CASE WHEN sale_date!=DATE_ADD(prev_date, INTERVAL 1 MONTH) or prev_date is null then 1 else 0 end as test
FROM CTE1
)
,
--create groups base on the consecutive test
CTE3 as (
SELECT b.*, SUM(test) OVER (partition by fruit ORDER BY sale_date) as groups
FROM CTE2
)
--get rid of the non conseutive dates and need to add 1 to make sure all months are counts
SELECT fruit, groups, min(prev_date), max(sale_date), count(*)+1 as months
FROM CTE3
where test!=1
GROUP BY fruit, groups
order by 1
For example I need to add 1 to the count at the end to get the correct number of months. Is there a better more eloquent way to do this?
Consider below (BigQuery Standard SQL)
select
fruit,
min(saledate) start_date,
max(saledate) end_date,
count(*) total
from (
select
t.*,
date_diff(SaleDate, min(SaleDate) over(partition by Fruit order by SaleDate), day) rn1,
row_number() over(partition by fruit, Top_Region order by saledate) rn2
from your_table t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date
with output