I have a Postgres table with these columns:
id int8
, user_id varchar
, is_favorite boolean
, join_time timestamptz
I want to delete some rows in this table with some conditions:
user_id
.user_id
's rows with is_favorite=true
is_favorite=true
per user_id
.)join_time
.I want to delete rows past the 10 per user_id in this table.
id|user_id |is_favorite|join_time
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 01:16:25.000 +0300
For user_id = 655caab8-ce81-11ed-afa1-0242ac120002
these IDs must be deleted: 11,12,13,14
For user_id = 81c126b6-ce81-11ed-afa1-0242ac120002
these IDs must be deleted 25,26.
Since you are processing the whole table, using a simple subquery with row_number()
should be fastest:
DELETE FROM tbl t
USING (
SELECT id, row_number() OVER (PARTITION BY user_id
ORDER BY is_favorite DESC, join_time DESC
ROWS UNBOUNDED PRECEDING) AS rn
FROM tbl t
) del
WHERE t.id = del.id
AND del.rn > 10;
Adding ROWS UNBOUNDED PRECEDING
is optional, but should make it substantially faster (until Postgres 16 is released). See:
Applying the right sort order, this skips the top 10 of most desirable rows per user and deletes the rest.
true
sorts before false
in descending order. See:
If there can be null
values, you need to do more. Like, first of all clarify your question.
Obviously, there would be race conditions with concurrent writes. If there can be concurrent write load, take a write lock on the table in the same transaction first ...
If that's going to delete the majority of rows, it may be cheaper to create a new table of survivors instead ...
There are other ways. Like:
Aside: use type uuid
for your user_id
column. Much better. See: