Search code examples
sqlpostgresqlsql-updatewindow-functionsgaps-and-islands

Add a column and populate based on other columns values


I have the following table:

CREATE TABLE trajectory(
    user_id int, 
    session_id int, 
    timestamp timestamp with time zone,
    lat double precision, 
    lon double precision
);

INSERT INTO trajectory(user_id, session_id, timestamp, lat, lon) VALUES 
(1, 25304,'2008-10-23 02:53:04+01', 39.984702, 116.318417),
(1, 25304, '2008-10-23 02:53:10+01', 39.984683, 116.31845), 
(1, 25304, '2008-10-23 02:53:15+01', 39.984686, 116.318417),
(1, 25304, '2008-10-23 02:53:20+01', 39.984688, 116.318385), 
(1, 20959,'2008-10-24 02:09:59+01', 40.008304, 116.319876),
(1, 20959,'2008-10-24 02:10:04+01', 40.008413, 116.319962), 
(1, 20959,'2008-10-24 02:10:14+01', 40.007171, 116.319458),
(2, 55305, '2008-10-23 05:53:05+01', 39.984094, 116.319236), 
(2, 55305, '2008-10-23 05:53:11+01', 39.984198, 116.319322), 
(2, 55305, '2008-10-23 05:53:21+01', 39.984224, 116.319402), 
(2, 34104, '2008-10-23 23:41:04+01', 40.013867, 116.306473),
(2, 34104, '2008-10-23 23:41:16+01', 40.013907, 116.306488);

Question:

I want to add a column trip_id to this table, based on the user_id and session_id columns, so that when the user's session id changes, I know the user is on a new trip so I add that id to the new trip column.

Required output:

user_id |session_id  |timestamp              |    lat       |   lon     | trip_id
--------|------------|-----------------------|--------------|-----------|-----------
  1     |     25304  |2008-10-23 02:53:04+01 | 39.984702    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:10+01 | 39.984683    |116.31845  |       1
  1     |     25304  |2008-10-23 02:53:15+01 | 39.984686    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:20+01 | 39.984688    |116.318385 |       1
  1     |     20959  |2008-10-24 02:09:59+01 |40.008304     |116.319876 |       2
  1     |     20959  |2008-10-24 02:10:04+01 |40.008413     |116.319962 |       2
  1     |     20959  |2008-10-24 02:10:14+01 |40.007171     |116.319458 |       2
  2     |     55305  |2008-10-23 05:53:05+01 |39.984094     |116.319236 |       1
  2     |     55305  |2008-10-23 05:53:11+01 |39.984198     |116.319322 |       1
  2     |     55305  |2008-10-23 05:53:21+01 |39.984224     |116.319402 |       1
  2     |     34104  |2008-10-23 23:41:04+01 |40.013867     |116.306473 |       2
  2     |     34104  |2008-10-23 23:41:16+01 |40.013907     |116.306488 |       2

How can I do this?

EDIT

Thanks for these great answers, but all the answers received are kind of retrieving table values, they do not modify the table. Plus, I added the timestamp column that I initially omitted thinking that would simplify my question.


Solution

  • This can be accomplished by using the window function lag() to retrieve the previous row and determine whether or not it has changed, followed by the window function sum() to retrieve the trip_id:

    with cte as (
      select *, case when 
                     session_id - lag(session_id, 1, session_id) 
                                  over (partition by user_id order by timestamp) = 0 
                     then 0 else 1 end as diff
      from trajectory
    )
    select user_id, session_id, timestamp,  lat, lon, 1 + sum(diff) over (partition by user_id order by timestamp) as trip_id
    from cte
    order by user_id, timestamp
    

    Assuming the user_id/timestamp is unique, the update can be :

    ALTER TABLE trajectory ADD COLUMN trip_id int;
    
    with cte as (
      select *, case when session_id - lag(session_id, 1, session_id) over (partition by user_id order by timestamp) = 0 then 0 else 1 end as diff
      from trajectory
    ),
    cte2 as (
      select user_id,   session_id, timestamp,  lat, lon, 1 + sum(diff) over (partition by user_id order by timestamp) as trip_id
      from cte
      order by user_id, timestamp
    )
    UPDATE trajectory 
    SET trip_id = cte2.trip_id
    FROM cte2
    WHERE trajectory.timestamp = cte2.timestamp and trajectory.user_id = cte2.user_id
    

    Result :

    user_id session_id timestamp lat lon trip_id
    1 25304 2008-10-23 02:53:04+01 39.984702 116.318417 1
    1 25304 2008-10-23 02:53:10+01 39.984683 116.31845 1
    1 25304 2008-10-23 02:53:15+01 39.984686 116.318417 1
    1 25304 2008-10-23 02:53:20+01 39.984688 116.318385 1
    1 20959 2008-10-24 02:09:59+01 40.008304 116.319876 2
    1 20959 2008-10-24 02:10:04+01 40.008413 116.319962 2
    1 20959 2008-10-24 02:10:14+01 40.007171 116.319458 2
    2 55305 2008-10-23 05:53:05+01 39.984094 116.319236 1
    2 55305 2008-10-23 05:53:11+01 39.984198 116.319322 1
    2 55305 2008-10-23 05:53:21+01 39.984224 116.319402 1
    2 34104 2008-10-23 23:41:04+01 40.013867 116.306473 2
    2 34104 2008-10-23 23:41:16+01 40.013907 116.306488 2

    Demo here