I have a Postgres table with these columns:
id int8
, user_id varchar
, is_favorite boolean
, join_time timestamptz
I want to delete some rows in this table with some conditions:
user_id
.user_id
's rows with is_favorite=true
(There can't be more than 5 rows with is_favorite=true
per user_id
.)join_time
.I want to delete rows past the 10 per user_id
in this table with multiple user_id
criteria.
There is a topic without user_id
criteria, but related to performance issues, I opened a new topic. For whole table deletion, see:
The script will execute 8 times in about 5 minutes.Table contains about 200K rows and relationol with users
table.(version:PostgreSQL 12.10)
Example:
id|user_id |is_favorite|join_time
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 01:16:25.000 +0300
27|91a005bc-cf89-11ed-afa1-0242ac120002|true |2023-01-08 05:27:25.000 +0300
28|91a005bc-cf89-11ed-afa1-0242ac120002|true |2023-02-09 07:32:25.000 +0300
29|91a005bc-cf89-11ed-afa1-0242ac120002|false |2023-02-10 08:51:25.000 +0300
For user_id in '655caab8-ce81-11ed-afa1-0242ac120002, 81c126b6-ce81-11ed-afa1-0242ac120002'
these IDs must be deleted: 11,12,13,14,25,26
What is the most efficient way of deal with this?
Applied to the whole table (all users), the query I suggested to your previous question is faster.
Applied to only a few given users, I suggest:
DELETE FROM tbl t
USING (
SELECT id, row_number() OVER (PARTITION BY user_id
ORDER BY is_favorite DESC, join_time DESC
ROWS UNBOUNDED PRECEDING) AS rn
FROM tbl t
WHERE user_id = ANY ({'655caab8-ce81-11ed-afa1-0242ac120002
,81c126b6-ce81-11ed-afa1-0242ac120002}')
) del
WHERE t.id = del.id
AND del.rn > 10;
Filter early.
An index with leading user_id
will help a lot. (You probably have one.)
An index with leading id
should exist in any case.
Other considerations as detailed in my previous answer: