I have an example dataset in my SQL table, and I need to group the data in a specific way.
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 0 | 2023-01-02 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 8 | 200 | 2023-01-06 |
| 8 | 200 | 2023-01-07 |
| 8 | 200 | 2023-01-08 |
| 8 | 200 | 2023-01-09 |
| 778 | 200 | 2023-10-25 |
| 778 | 200 | 2023-10-26 |
+-------+----------+----------------+
I would like to have this as the final outcome: grouping the amounts that are the same by ascending days and taking the minimum date from them. Additionally, grouping should be done based on the different IDs. As seen in the image below, you can see that the ID 8 with DATE as 2023-01-1 & 2023-01-02 are grouped together. The next record has a mutation of SUM = 10, so it is also seen as a separate row. Then, the SUM = 10 is mutated back to SUM = 0, which should also be treated as a new row.
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
I have tried the SQL query below, but I have realized that the row with ID = 8, SUM = 0, DATE 2023-01-04 is being grouped together with the rows from 2023-01-01 & 2023-01-02 and is missing
SELECT
[ID],
[SUM],
MIN([DATE]) AS [Date]
FROM [dbo].[test]
GROUP BY [ID], [SUM]
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
Can someone help me come up with a clever solution that takes into account the ascending DATE column while grouping, or else create a new row with the mutation?
If you just want the first row from each group you can do:
select id, sum, [date]
from (
select t.*, lag([date]) over (partition by id, sum order by [date]) as prev_date
from mytable t
) x
where prev_date is null
order by [date]
Result:
id sum date
---- ---- ----------
8 0 2023-01-01
8 10 2023-01-03
8 200 2023-01-05
778 200 2023-10-25
See running example at db<>fiddle.