Search code examples
postgresqlconcatenationwindow-functionscase-when

Concatenate string by a condition


I want to assign a value to a new column based on a condition for other rows associated with a user_id.

e.g. when a device changes from desktop to mobile, then assign desktop > mobile to all records for that user_id. When there are more than two distinct changes, like from tablet to desktop and desktop to mobile, then tablet > desktop > mobile

Here is the sample data:

+---------+-------+---------+
| user_id | step  | device  |
+---------+-------+---------+
| 7bc6de  | step1 | desktop |
| 7bc6de  | step2 | desktop |
| 7bc6de  | step3 | mobile  |
| 7bc6de  | step4 | mobile  |
| 7bc6de  | step5 | desktop |
| 0ee6df  | step1 | tablet  |
| 0ee6df  | step2 | tablet  |
| 0ee6df  | step3 | desktop |
| 0ee6df  | step4 | desktop |
| 0ee6df  | step5 | mobile  |
+---------+-------+---------+

Desired output:

+---------+-------+---------+---------------------------+
| user_id | step  | device  |    device_concatenated    |
+---------+-------+---------+---------------------------+
| 7bc6de  | step1 | desktop | desktop > mobile          |
| 7bc6de  | step2 | desktop | desktop > mobile          |
| 7bc6de  | step3 | mobile  | desktop > mobile          |
| 7bc6de  | step4 | mobile  | desktop > mobile          |
| 7bc6de  | step5 | desktop | desktop > mobile          |
| 0ee6df  | step1 | tablet  | tablet > desktop > mobile |
| 0ee6df  | step2 | tablet  | tablet > desktop > mobile |
| 0ee6df  | step3 | desktop | tablet > desktop > mobile |
| 0ee6df  | step4 | desktop | tablet > desktop > mobile |
| 0ee6df  | step5 | mobile  | tablet > desktop > mobile |
+---------+-------+---------+---------------------------+

Additional scenario:

In the table, there're duplicate steps, i.e. a user can see the same step with different devices at different times. In this case, how can I get the first step per user and device with the change like below in the expected outcome?

+---------+-------+---------------------+---------+
| user_id | step  |     created_at      | device  |
+---------+-------+---------------------+---------+
| user1   | step1 | 2021-03-16 14:03:16 | mobile  |
| user1   | step2 | 2021-03-16 14:04:07 | mobile  |
| user1   | step2 | 2021-03-16 14:03:47 | desktop |
| user1   | step3 | 2021-03-16 14:03:55 | mobile  |
| user1   | step3 | 2021-03-16 14:04:00 | mobile  |
| user1   | step1 | 2021-03-16 14:04:02 | desktop |
| user1   | step2 | 2021-03-16 14:03:16 | mobile  |
| user1   | step3 | 2021-03-16 14:04:07 | mobile  |
| user1   | step4 | 2021-03-16 14:04:08 | desktop |
| user1   | step4 | 2021-03-16 14:04:09 | tablet  |
+---------+-------+---------------------+---------+

The expected result:

+---------+-------+---------------------+---------+---------------------+
| user_id | step  |     created_at      | device  | device_concatenated |
+---------+-------+---------------------+---------+---------------------+
| user1   | step1 | 2021-03-16 14:03:16 | mobile  | mobile > desktop    |
| user1   | step2 | 2021-03-16 14:03:16 | mobile  | mobile > desktop    |
| user1   | step3 | 2021-03-16 14:03:55 | mobile  | mobile > desktop    |
| user1   | step4 | 2021-03-16 14:04:08 | desktop | mobile > desktop    |
+---------+-------+---------------------+---------+---------------------+

Solution

  • https://www.db-fiddle.com/f/ooSmXAxqVHNxqD8sJ6wZfr/0

    with first_seen_per_user_and_device AS (
    select user_id, device, min(step) first_seen_step
    from input_data
    group by user_id, device
    ),
    user_to_devices as(
    SELECT user_id, array_to_string(
      array_agg(device order by first_seen_step), ' > ') device_concatenated
      from first_seen_per_user_and_device
    group by 1
    )
    SELECT input_data.*, device_concatenated 
    from input_data
    join user_to_devices
      ON user_to_devices.user_id = input_data.user_id;
    

    If it's possible for the same user & step to be seen with multiple devices, you'll want to add an additional WITH clause to pick only the one you want (eg, the earliest one), using a SELECT DISTINCT:

    https://www.db-fiddle.com/f/w9ZRvpQ7KXgdVKCTDAb43o/0

    WITH input_data as (
      select distinct on (user_id, step) user_id, step, created_at, device
      from input_data_with_created_at
      ORDER BY user_id, step, created_at
    ), 
    (...) -- Rest of the CTEs, same as before but with timestamp included.