I have a query, which returns the following, EXCEPT for the last column, which is what I need to figure out how to create. For each given ObservationID
I need to return the date on which the status changes; something like a LEAD() function that would take conditions and not just offsets. Can it be done?
I need to calculate the column Change Date; it should be the last date the status was not the current status.
+---------------+--------+-----------+--------+-------------+
| ObservationID | Region | Date | Status | Change Date | <-This field
+---------------+--------+-----------+--------+-------------+
| 1 | 10 | 1/3/2012 | Ice | 1/4/2012 |
| 2 | 10 | 1/4/2012 | Water | 1/6/2012 |
| 3 | 10 | 1/5/2012 | Water | 1/6/2012 |
| 4 | 10 | 1/6/2012 | Gas | 1/7/2012 |
| 5 | 10 | 1/7/2012 | Ice | |
| 6 | 20 | 2/6/2012 | Water | 2/10/2012 |
| 7 | 20 | 2/7/2012 | Water | 2/10/2012 |
| 8 | 20 | 2/8/2012 | Water | 2/10/2012 |
| 9 | 20 | 2/9/2012 | Water | 2/10/2012 |
| 10 | 20 | 2/10/2012 | Ice | |
+---------------+--------+-----------+--------+-------------+
a model clause (10g+) can do this in a compact way:
SQL> create table observation(ObservationID , Region ,obs_date, Status)
2 as
3 select 1, 10, date '2012-03-01', 'Ice' from dual union all
4 select 2, 10, date '2012-04-01', 'Water' from dual union all
5 select 3, 10, date '2012-05-01', 'Water' from dual union all
6 select 4, 10, date '2012-06-01', 'Gas' from dual union all
7 select 5, 10, date '2012-07-01', 'Ice' from dual union all
8 select 6, 20, date '2012-06-02', 'Water' from dual union all
9 select 7, 20, date '2012-07-02', 'Water' from dual union all
10 select 8, 20, date '2012-08-02', 'Water' from dual union all
11 select 9, 20, date '2012-09-02', 'Water' from dual union all
12 select 10, 20, date '2012-10-02', 'Ice' from dual ;
Table created.
SQL> select ObservationID, obs_date, Status, status_change
2 from observation
3 model
4 dimension by (Region, obs_date, Status)
5 measures ( ObservationID, obs_date obs_date2, cast(null as date) status_change)
6 rules (
7 status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
8 )
9 order by 1;
OBSERVATIONID OBS_DATE STATU STATUS_CH
------------- --------- ----- ---------
1 01-MAR-12 Ice 01-APR-12
2 01-APR-12 Water 01-JUN-12
3 01-MAY-12 Water 01-JUN-12
4 01-JUN-12 Gas 01-JUL-12
5 01-JUL-12 Ice
6 02-JUN-12 Water 02-OCT-12
7 02-JUL-12 Water 02-OCT-12
8 02-AUG-12 Water 02-OCT-12
9 02-SEP-12 Water 02-OCT-12
10 02-OCT-12 Ice
fiddle: http://sqlfiddle.com/#!4/f6687/1
i.e. we will dimension on region, date and status as we want to look at cells with the same region, but get the first date that the status differs on.
we also have to measure date too so i created an alias obs_date2
to do that, and we want a new column status_change
to hold the date the status changed.
this line is the line that does all the working out for us:
status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
it says, for our three dimensions, only look at the rows with the same region (cv(Region),
) and look at rows where the date follows the date of the current row (obs_date > cv(obs_date)
) and also the status is different from the current row (status != cv(status)
) finally get the minimum date that satisfies this set of conditions (min(obs_date2)
) and assign it to status_change
. The any,any,any
part on the left means this calculation applies to all rows.