Search code examples
sqlscalaapache-sparklaggaps-and-islands

SQL group by changing column


Suppose I have a table sorted by date as so:

+-------------+--------+
|    DATE     |  VALUE |
+-------------+--------+
|  01-09-2020 |   5    |
|  01-15-2020 |   5    |
|  01-17-2020 |   5    |
|  02-03-2020 |   8    |
|  02-13-2020 |   8    |
|  02-20-2020 |   8    |
|  02-23-2020 |   5    |
|  02-25-2020 |   5    |
|  02-28-2020 |   3    |
|  03-13-2020 |   3    |
|  03-18-2020 |   3    |
+-------------+--------+

I want to group by changes in value within that given date range, and add a value that increments each time as an added column to denote that.

I have tried a number of different things, such as using the lag function:

SELECT value, value - lag(value) over (order by date) as count
GROUP BY value

In short, I want to take the table above and have it look like:

+-------------+--------+-------+
|    DATE     |  VALUE | COUNT |
+-------------+--------+-------+
|  01-09-2020 |   5    |   1   |
|  01-15-2020 |   5    |   1   |
|  01-17-2020 |   5    |   1   |
|  02-03-2020 |   8    |   2   |
|  02-13-2020 |   8    |   2   |
|  02-20-2020 |   8    |   2   |
|  02-23-2020 |   5    |   3   |
|  02-25-2020 |   5    |   3   |
|  02-28-2020 |   3    |   4   |
|  03-13-2020 |   3    |   4   |
|  03-18-2020 |   3    |   4   |
+-------------+--------+-------+

I want to eventually have it all in one small table with the earliest date for each.

+-------------+--------+-------+
|    DATE     |  VALUE | COUNT |
+-------------+--------+-------+
|  01-09-2020 |   5    |   1   |
|  02-03-2020 |   8    |   2   |
|  02-23-2020 |   5    |   3   |
|  02-28-2020 |   3    |   4   |
+-------------+--------+-------+

Any help would be very appreciated


Solution

  • You can use a lag and cumulative sum and a subquery:

    SELECT value,
           SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
    FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
          FROM t
         ) t
    

    Here is a db<>fiddle.