I want to assign a value to a new column based on a condition for other rows associated with a user_id
.
e.g. when a device
changes from desktop to mobile, then assign desktop > mobile to all records for that user_id
.
When there are more than two distinct changes, like from tablet to desktop and desktop to mobile, then tablet > desktop > mobile
Here is the sample data:
+---------+-------+---------+
| user_id | step | device |
+---------+-------+---------+
| 7bc6de | step1 | desktop |
| 7bc6de | step2 | desktop |
| 7bc6de | step3 | mobile |
| 7bc6de | step4 | mobile |
| 7bc6de | step5 | desktop |
| 0ee6df | step1 | tablet |
| 0ee6df | step2 | tablet |
| 0ee6df | step3 | desktop |
| 0ee6df | step4 | desktop |
| 0ee6df | step5 | mobile |
+---------+-------+---------+
Desired output:
+---------+-------+---------+---------------------------+
| user_id | step | device | device_concatenated |
+---------+-------+---------+---------------------------+
| 7bc6de | step1 | desktop | desktop > mobile |
| 7bc6de | step2 | desktop | desktop > mobile |
| 7bc6de | step3 | mobile | desktop > mobile |
| 7bc6de | step4 | mobile | desktop > mobile |
| 7bc6de | step5 | desktop | desktop > mobile |
| 0ee6df | step1 | tablet | tablet > desktop > mobile |
| 0ee6df | step2 | tablet | tablet > desktop > mobile |
| 0ee6df | step3 | desktop | tablet > desktop > mobile |
| 0ee6df | step4 | desktop | tablet > desktop > mobile |
| 0ee6df | step5 | mobile | tablet > desktop > mobile |
+---------+-------+---------+---------------------------+
Additional scenario:
In the table, there're duplicate steps, i.e. a user can see the same step with different devices at different times. In this case, how can I get the first step per user and device with the change like below in the expected outcome?
+---------+-------+---------------------+---------+
| user_id | step | created_at | device |
+---------+-------+---------------------+---------+
| user1 | step1 | 2021-03-16 14:03:16 | mobile |
| user1 | step2 | 2021-03-16 14:04:07 | mobile |
| user1 | step2 | 2021-03-16 14:03:47 | desktop |
| user1 | step3 | 2021-03-16 14:03:55 | mobile |
| user1 | step3 | 2021-03-16 14:04:00 | mobile |
| user1 | step1 | 2021-03-16 14:04:02 | desktop |
| user1 | step2 | 2021-03-16 14:03:16 | mobile |
| user1 | step3 | 2021-03-16 14:04:07 | mobile |
| user1 | step4 | 2021-03-16 14:04:08 | desktop |
| user1 | step4 | 2021-03-16 14:04:09 | tablet |
+---------+-------+---------------------+---------+
The expected result:
+---------+-------+---------------------+---------+---------------------+
| user_id | step | created_at | device | device_concatenated |
+---------+-------+---------------------+---------+---------------------+
| user1 | step1 | 2021-03-16 14:03:16 | mobile | mobile > desktop |
| user1 | step2 | 2021-03-16 14:03:16 | mobile | mobile > desktop |
| user1 | step3 | 2021-03-16 14:03:55 | mobile | mobile > desktop |
| user1 | step4 | 2021-03-16 14:04:08 | desktop | mobile > desktop |
+---------+-------+---------------------+---------+---------------------+
https://www.db-fiddle.com/f/ooSmXAxqVHNxqD8sJ6wZfr/0
with first_seen_per_user_and_device AS (
select user_id, device, min(step) first_seen_step
from input_data
group by user_id, device
),
user_to_devices as(
SELECT user_id, array_to_string(
array_agg(device order by first_seen_step), ' > ') device_concatenated
from first_seen_per_user_and_device
group by 1
)
SELECT input_data.*, device_concatenated
from input_data
join user_to_devices
ON user_to_devices.user_id = input_data.user_id;
If it's possible for the same user & step to be seen with multiple devices, you'll want to add an additional WITH clause to pick only the one you want (eg, the earliest one), using a SELECT DISTINCT
:
https://www.db-fiddle.com/f/w9ZRvpQ7KXgdVKCTDAb43o/0
WITH input_data as (
select distinct on (user_id, step) user_id, step, created_at, device
from input_data_with_created_at
ORDER BY user_id, step, created_at
),
(...) -- Rest of the CTEs, same as before but with timestamp included.