Here is a simple query to get the projects
list ordered by participations
count.
SELECT projects.*, COUNT(participations) as ct from projects
INNER JOIN participations ON participations.project_id=projects.id
GROUP BY projects.id
ORDER BY ct DESC
LIMIT 5;
I'm not astonished it's slow (2 seconds). A simple query without ordering by participations
count, is super speed (it's normal, ~10ms).
There are ~2k projects
and ~430k participations
.
What should I do to speed up this request? I feel I have to add an index, but I don't know which one(s?) and how.
EDIT:
Table paticipations
:
create table participations
(
id serial not null
constraint participations_pkey
primary key,
project_id integer not null
constraint participations_project_id_foreign
references projects
constraint participations_service_id_foreign
references projects,
status varchar(32) default 'unknown'::character varying not null,
data jsonb,
created_at timestamp(0),
updated_at timestamp(0),
deleted_at timestamp(0)
);
create index participations_status_index on participations (status);
create index participation_data_idx on participations (data);
Table projects
:
create table projects
(
id serial not null
constraint services_pkey
primary key,
user_id integer not null
constraint services_user_id_foreign
references users,
type varchar(50) not null,
name varchar(255) not null,
url varchar(255) not null,
start_at timestamp(0),
end_at timestamp(0),
visibility varchar(255) not null,
status varchar(255) not null,
metadata jsonb not null,
data jsonb not null,
created_at timestamp(0),
updated_at timestamp(0),
deleted_at timestamp(0)
);
create index services_type_index on projects (type);
create index services_name_index on projects (name);
create index services_url_index on projects (url);
create index services_start_at_index on projects (start_at);
create index services_end_at_index on projects (end_at);
create index services_visibility_index on projects (visibility);
create index services_status_index on projects (status);
Query plan:
Limit (cost=487359.41..487359.43 rows=5 width=2218) (actual time=2177.264..2177.265 rows=5 loops=1)
Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
-> Sort (cost=487359.41..487366.68 rows=2907 width=2218) (actual time=2177.262..2177.262 rows=5 loops=1)
Sort Key: (count(participations.*)) DESC
Sort Method: top-N heapsort Memory: 33kB
Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
-> GroupAggregate (cost=477474.47..487311.13 rows=2907 width=2218) (actual time=1170.136..2175.445 rows=1884 loops=1)
Group Key: projects.id
Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
-> Merge Join (cost=477474.47..485105.80 rows=435252 width=2218) (actual time=1170.122..2120.557 rows=435207 loops=1)
Merge Cond: (projects.id = participations.project_id)
Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
-> Sort (cost=625.30..632.57 rows=2907 width=1131) (actual time=22.443..27.047 rows=2879 loops=1)
Sort Key: projects.id
Sort Method: external sort Disk: 3216kB
Buffers: shared hit=429, temp read=402 written=402
-> Seq Scan on projects (cost=0.00..458.07 rows=2907 width=1131) (actual time=0.011..1.532 rows=2879 loops=1)
Buffers: shared hit=429
-> Materialize (cost=476849.12..479025.38 rows=435252 width=1091) (actual time=1147.671..1996.166 rows=435207 loops=1)
Buffers: shared hit=37017 read=3493, temp read=20649 written=39000
-> Sort (cost=476849.12..477937.25 rows=435252 width=1091) (actual time=1147.666..1400.856 rows=435207 loops=1)
Sort Key: participations.project_id
Sort Method: external merge Disk: 165040kB
Buffers: shared hit=37017 read=3493, temp read=20649 written=20649
-> Seq Scan on participations (cost=0.00..19528.52 rows=435252 width=1091) (actual time=0.012..318.347 rows=435207 loops=1)
Buffers: shared hit=37017 read=3493
Planning time: 1.309 ms
Execution time: 2203.249 ms
The order by
cannot use an index, because it is based on a calculated value.
I might suggest doing the order by
and limit
in a subquery:
SELECT pr.*, p.ct
FROM projects pr JOIN
(SELECT p.project_id, COUNT(*) as ct
FROM participations p
GROUP BY p.project_id
ORDER BY ct DESC
LIMIT 5
) p
ON p.project_id = pr.id;