I have a table1
with timestamps, grouped by an added “Period ID”:
Period ID (uuid) Created At (timestamptz)
A 2017-04-11 11:13:47.997+00
A 2017-04-11 14:42:51.843+00
B 2017-05-21 15:18:12.973+00
B 2017-05-21 15:28:41.054+00
B 2017-05-21 15:28:57.577+00
C 2017-06-11 22:48:00.637+00
D 2017-07-15 14:45:52.343+00
D 2017-07-15 14:47:53.343+00
E 2017-08-25 16:22:57.612+00
F 2017-09-16 00:10:18.577+00
I need to create a new ID “Period ID 2” as a subset of Period ID, such that:
So, the output should be like this:
Period ID (uuid) Created At (timestamptz) Period2 (uuid)
A 2017-04-11 11:13:47.997+00 1
A 2017-04-11 14:42:51.843+00 2
B 2017-05-21 15:18:12.973+00 3
B 2017-05-21 15:28:41.054+00 4
B 2017-05-21 15:28:57.577+00 4
C 2017-06-11 22:48:00.637+00 5
D 2017-07-15 14:45:52.343+00 6
D 2017-07-15 14:47:53.343+00 6
E 2017-08-25 16:22:57.612+00 7
F 2017-09-16 00:10:18.577+00 8
I can add the column, but I am not sure how to populate it. I looked into the following flow: ALTER TABLE
, CREATE FUNCTION
(for trigger function), CREATE TRIGGER
, ... But I wasn’t able to query it properly. Is there any other way to approach this problem?
This does what you ask:
SELECT period_id, created_at, md5(period_id::text || grp)::uuid AS period_id2
FROM (
SELECT *, count(*) FILTER (WHERE step) OVER (PARTITION BY period_id ORDER BY created_at) AS grp
FROM (
SELECT *, (created_at - lag(created_at) OVER (PARTITION BY period_id ORDER BY created_at)) > interval '10 min' AS step
FROM tbl
) sub1
) sub2;
Or with just a single subquery - albeit less readable:
SELECT period_id, created_at
, md5(period_id::text || count(*) FILTER (WHERE step) OVER (PARTITION BY period_id ORDER BY created_at))::uuid AS period_id2
FROM (
SELECT *, (created_at - lag(created_at) OVER (PARTITION BY period_id ORDER BY created_at)) > interval '10 min' AS step
FROM tbl
) sub;
db<>fiddle here
I chose a deterministic way to generate UUIDs. So the Nth group within the same period_id
always gets the same, reproducible UUID.
Basic explanation for this query technique:
About md5()
and UUID:
Maybe you don't actually need UUIDs to begin with? I see them used a lot without need. See:
If you are at liberty to replace the existing table, just creating a new one will be fastest:
CREATE TABLE tbl2 AS
SELECT period_id, created_at
, md5(period_id::text || count(*) FILTER (WHERE step) OVER (PARTITION BY period_id ORDER BY created_at))::uuid AS period_id2
FROM (
SELECT *, (created_at - lag(created_at) OVER (PARTITION BY period_id ORDER BY created_at)) > interval '10 min' AS step
FROM tbl
) sub;
If you write to the table, period_id2
for all involved period_id
have to be recalculated. In case of INSERT
only the ones later than the new created_at
(plus the new row itself). With all the added bloat and vacuum cost to the table, this gets expensive quickly.
Maybe consider a MATERIALIZED VIEW
. But at some point, if the write costs outweigh read cost, it will be cheaper not to persist period_id2
at all and compute it on the fly. Heavily depends on actual read and write patterns.