Search code examples
sqlsql-servert-sqlgaps-and-islands

How can I create an intelligent grouping in T-SQL to group a dataset based on an ascending DATE value?


I have an example dataset in my SQL table, and I need to group the data in a specific way.

+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |        0 |     2023-01-02 |
|     8 |       10 |     2023-01-03 |
|     8 |        0 |     2023-01-04 |
|     8 |      200 |     2023-01-05 |
|     8 |      200 |     2023-01-06 |
|     8 |      200 |     2023-01-07 |
|     8 |      200 |     2023-01-08 |
|     8 |      200 |     2023-01-09 |
|   778 |      200 |     2023-10-25 |
|   778 |      200 |     2023-10-26 |
+-------+----------+----------------+

I would like to have this as the final outcome: grouping the amounts that are the same by ascending days and taking the minimum date from them. Additionally, grouping should be done based on the different IDs. As seen in the image below, you can see that the ID 8 with DATE as 2023-01-1 & 2023-01-02 are grouped together. The next record has a mutation of SUM = 10, so it is also seen as a separate row. Then, the SUM = 10 is mutated back to SUM = 0, which should also be treated as a new row.

+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |       10 |     2023-01-03 |
|     8 |        0 |     2023-01-04 |
|     8 |      200 |     2023-01-05 |
|   778 |      200 |     2023-10-25 |
+-------+----------+----------------+

I have tried the SQL query below, but I have realized that the row with ID = 8, SUM = 0, DATE 2023-01-04 is being grouped together with the rows from 2023-01-01 & 2023-01-02 and is missing

SELECT
      [ID],
      [SUM],
      MIN([DATE]) AS [Date]
 FROM [dbo].[test] 
 GROUP BY [ID], [SUM]
+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |       10 |     2023-01-03 |
|     8 |      200 |     2023-01-05 |
|   778 |      200 |     2023-10-25 |
+-------+----------+----------------+

Can someone help me come up with a clever solution that takes into account the ascending DATE column while grouping, or else create a new row with the mutation? 


Solution

  • If you just want the first row from each group you can do:

    select id, sum, [date]
    from (
      select t.*, lag([date]) over (partition by id, sum order by [date]) as prev_date
      from mytable t
    ) x
    where prev_date is null
    order by [date]
    

    Result:

     id   sum  date       
     ---- ---- ---------- 
     8    0    2023-01-01 
     8    10   2023-01-03 
     8    200  2023-01-05 
     778  200  2023-10-25 
    

    See running example at db<>fiddle.