Search code examples
sqlhivesqlhelpermysql-helper

Lag Based on Specified Order in SQL


Does anyone know of a way to LAG in a specified order? In the example below, I provide a table with my current output [Lag (Current)] and my desired output [Lag (Desired)] columns. I am interested in creating a lag for the event time based on event time order, need to make sure that the Groups do not get out of order. The issue is that Group CL1 is currently not being lagged based on time, its based on the group, so that is where I am struggling.

Personal ID Event Time Groups Lag (Current) Lag (Desired)
99999999 4:18:58 PM GM1
99999999 4:21:03 PM GM1 4:18:58 PM 4:18:58 PM
99999999 4:21:42 PM CL1
99999999 4:25:04 PM CL1 4:21:42 PM 4:21:42 PM
99999999 4:25:40 PM GM2
99999999 4:25:45 PM GM2 4:25:40 PM 4:25:40 PM
99999999 4:26:14 PM CL1 4:25:04 PM
99999999 4:26:23 PM CL1 4:21:42 PM 4:21:42 PM

Here is the SQL I have tried.

SELECT LAG() OVER (PARTITION BY Personal ID, Groups ORDER BY Personal ID, Event Time)

Updated

Second example

The below seemed to work, but not sure if its correct, especially as more data passes through. It did work for this example and the one you provided above, I just added the groups in the the last w2.

WITH sample_table AS (
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'CL1' groups UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'FZ1' UNION ALL
  SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' 
)

Expected Output enter image description here


Solution

  • Consider below gaps and islands approach.

    WITH sample_table AS (
      SELECT '99999999' personal_id, '4:18:58 PM' event_time, 'GM1' groups UNION ALL
      SELECT '99999999' personal_id, '4:21:03 PM' event_time, 'GM1' UNION ALL
      SELECT '99999999' personal_id, '4:21:42 PM' event_time, 'CL1' UNION ALL
      SELECT '99999999' personal_id, '4:25:04 PM' event_time, 'CL1' UNION ALL
      SELECT '99999999' personal_id, '4:25:40 PM' event_time, 'GM2' UNION ALL
      SELECT '99999999' personal_id, '4:25:45 PM' event_time, 'GM2' UNION ALL
      SELECT '99999999' personal_id, '4:26:14 PM' event_time, 'CL1' UNION ALL
      SELECT '99999999' personal_id, '4:26:23 PM' event_time, 'CL1' 
    )
    SELECT personal_id, event_time, groups, LAG(event_time) OVER w2 AS lag FROM (
      SELECT *, COUNT(1) OVER w1 - SUM(IF(flag, 1, 0)) OVER w1 AS part FROM (
        SELECT *, groups = LAG(groups) OVER w0 AS flag 
          FROM sample_table
        WINDOW w0 AS (PARTITION BY personal_id ORDER BY event_time)
      ) t WINDOW w1 AS (PARTITION BY personal_id ORDER BY event_time)
    ) t WINDOW w2 AS (PARTITION BY personal_id, part ORDER BY event_time);
    

    Query results

    enter image description here

    Updated

    WITH sample_table AS (
      SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'CL1' groups UNION ALL
      SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'GM2' UNION ALL
      SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'FZ1' UNION ALL
      SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' UNION ALL
      SELECT '99999999' personal_id, '8:13:43 AM' event_time, 'GM2'
    )
    SELECT personal_id, event_time, groups, LAG(event_time) OVER w2 AS lag FROM (
      SELECT *, COUNT(1) OVER w1 - SUM(IF(flag, 1, 0)) OVER w1 AS part FROM (
        SELECT *, groups = LAG(groups) OVER w0 AS flag 
          FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY personal_id, event_time) AS rn FROM sample_table) t
        WINDOW w0 AS (PARTITION BY personal_id ORDER BY event_time, rn)
      ) t WINDOW w1 AS (PARTITION BY personal_id ORDER BY event_time, rn)
    ) t WINDOW w2 AS (PARTITION BY personal_id, part ORDER BY event_time, rn);
    

    Query results

    enter image description here