Search code examples
sqloracle-databaseoracle11ganalytic-functions

How the get last change timestamp for row?


I have a data set like this (DDL below):

+----+------------------+----------------------+---------------------+
| ID | NAME             | EMAIL                | LAST_UPD            |
+----+------------------+----------------------+---------------------+
| 1  | JOHN SMITH       | [email protected] | 29/04/2017 10:50:51 |
+----+------------------+----------------------+---------------------+
| 1  | J SMITH          | [email protected] | 29/04/2017 10:51:15 |
+----+------------------+----------------------+---------------------+
| 1  | J SMITH          | [email protected] | 29/04/2017 10:51:36 |
+----+------------------+----------------------+---------------------+
| 1  | JOHN JAMES SMITH | [email protected] | 29/04/2017 10:52:11 |
+----+------------------+----------------------+---------------------+
| 2  | JAMES FORD       | [email protected] | 29/04/2017 10:52:57 |
+----+------------------+----------------------+---------------------+
| 2  | JAMES FORD       | [email protected]     | 29/04/2017 10:53:17 |
+----+------------------+----------------------+---------------------+
| 2  | JAMES FORD       | [email protected]     | 29/04/2017 11:47:15 |
+----+------------------+----------------------+---------------------+

I'm trying to get a last update date (timestamp for change) for name and email columns and the corresponding values from these fields within IDs. If the given attribute has not changed, the minimun LAST_UPD should be received. I tried this and got the values I would like to have but how to "squeeze" this for given ID?

SELECT 
ID,
NAME,
CASE 
  WHEN LAG(NAME)OVER(PARTITION BY ID ORDER BY LAST_UPD) != NAME 
  THEN LAST_UPD 
  WHEN LEAD(NAME)OVER(PARTITION BY ID ORDER BY LAST_UPD) = NAME
  THEN LAST_UPD
END NAME_CHANGED,
EMAIL, 
CASE 
  WHEN LAG(EMAIL)OVER(PARTITION BY ID ORDER BY LAST_UPD) != EMAIL 
  THEN LAST_UPD 
  WHEN LEAD(EMAIL)OVER(PARTITION BY ID ORDER BY LAST_UPD) = EMAIL
  THEN LAST_UPD
END EMAIL_CHANGED
FROM CUSTOMER
;

The result should be this:

+----+------------------+---------------------+----------------------+---------------------+
| ID | NAME             | NAME_CHANGED        | EMAIL                | EMAIL_CHANGED       |
+----+------------------+---------------------+----------------------+---------------------+
| 1  | JOHN JAMES SMITH | 29/04/2017 10:52:11 | [email protected] | 29/04/2017 10:50:51 |
+----+------------------+---------------------+----------------------+---------------------+
| 2  | JAMES FORD       | 29/04/2017 10:52:57 | [email protected]     | 29/04/2017 10:53:17 |
+----+------------------+---------------------+----------------------+---------------------+

DDL:

CREATE TABLE CUSTOMER 
(
  ID VARCHAR2(20) 
, NAME VARCHAR2(50) 
, EMAIL VARCHAR2(50) 
, LAST_UPD DATE 
);

REM INSERTING into CUSTOMER
SET DEFINE OFF;
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','JOHN SMITH','[email protected]',to_date('29/04/2017 10:50:51','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','J SMITH','[email protected]',to_date('29/04/2017 10:51:15','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','J SMITH','[email protected]',to_date('29/04/2017 10:51:36','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','JOHN JAMES SMITH','[email protected]',to_date('29/04/2017 10:52:11','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','[email protected]',to_date('29/04/2017 10:52:57','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','[email protected]',to_date('29/04/2017 10:53:17','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','[email protected]',to_date('29/04/2017 11:47:15','DD/MM/YYYY HH24:MI:SS'));
COMMIT;

SELECT * FROM CUSTOMER;

Solution

  • I think the key idea is to start with a flag that indicates whether an email or name changed. You can get this using lag. And, with the right logic, you will even mark the first record as a change.

    Then, you want the last record for each of the columns that is marked as a change. The following code does this using the first_value() function -- because it can ignore null values:

    select distinct id,
           first_value((case when name_changed then name end) ignore nulls) over (partition by id order by last_upd desc),
           max(case when name_changed then last_upd end) over (partition by id) as last_upd_name,
           first_value((case when email_changed then email end) ignore nulls) over (partition by id order by last_upd desc),
           max(case when email_changed then last_upd end) over (partition by id) as last_upd_email
    from (select c.*,
                 (case when c.name = lag(c.name) over (partition by c.id over order by c.last_upd) as name_changed,
                       then 0 else 1
                  end) as name_changed,
                 (case when c.email = lag(c.email) over (partition by c.id over order by c.last_upd) as email_change
                       then 0 else 1
                  end) as email_changed
          from customer c
         ) c;