Search code examples
sqloracle-databaseoracle12c

SQL group by an column but segmented based on another column


I have this table which contain roughly more than 100000 rows and with 3 columns:

  • Account_number
  • Report_date
  • Outstanding_amount

I need to find a statement that group the outstanding amount by account but also cut it based on the date. Sample data for 1 account:

+----------------+-------------+--------------------+--+
| account_number | report_date | outstanding_amount |  |
+----------------+-------------+--------------------+--+
|              1 | 02/01/2019  |                100 |  |
|              1 | 03/01/2019  |                100 |  |
|              1 | 06/01/2019  |                200 |  |
|              1 | 07/01/2019  |                300 |  |
|              1 | 10/01/2019  |                200 |  |
|              1 | 11/01/2019  |                200 |  |
|              1 | 12/01/2019  |                100 |  |
+----------------+-------------+--------------------+--+    

So if I run this statement:

select * from (select account_number, min(report_date) mindate, max(report_date) maxdate, outstading_amount from table1 grouped by account_number, outstanding_amount)

The result of this statement should be similar to this:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 12/01/2019 |                100 |
|              1 | 06/01/2019 | 11/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
+----------------+------------+------------+--------------------+

So here I want to separate the result so that the days between mindate and maxdate of one row won't overlap the days in the next row. The result I'm looking is something like this:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 03/01/2019 |                100 |
|              1 | 06/01/2019 | 06/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
|              1 | 10/01/2019 | 11/01/2019 |                200 |
|              1 | 12/01/2019 | 12/01/2019 |                100 |
+----------------+------------+------------+--------------------+

Is it possible to construct this statement?


Solution

  • This is a gaps-and-islands problem. In this case, the simplest solution is probably the difference of row numbers:

    select account_number, outstanding_amount,
           min(report_date), max(report_date)
    from (select t.*,
                 row_number() over (partition by account_number order by report_date) as seqnum,
                 row_number() over (partition by account_number, outstanding_amount order by report_date) as seqnum_o
          from t
         ) t
    group by account_number, outstanding_amount, (seqnum - seqnum_o)
    order by account_number, min(report_date);
    

    Why this works is a little tricky to explain. But if you look at the results of the subquery, you will be able to see how the difference of row numbers defines the adjacent rows with the same amount.