There's two tables:
users
:
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
username TEXT NOT NULL UNIQUE
)
documents
:
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users,
name TEXT NOT NULL,
value INT NOT NULL
)
I want to filter users
by document name
and value
. Usually, between 2-5 document name
and value
is filtered. And there are apprx 6-10 documents per user.
I have a huge database and want to improve this query. I think I can get faster query without the HAVING
clause. Any help highly appreciated. I use PostgreSQL 13.
The query I am using:
SELECT
users.username,
jsonb_agg(jsonb_strip_nulls(jsonb_build_object('name', documents.name, 'value', documents.value))) AS docs
FROM
users
JOIN
documents
ON
users.id = documents.user_id
GROUP BY
users.username
HAVING
jsonb_agg(jsonb_build_object('name', documents.name, 'value', documents.value)) @? '$[*] ? (@.name == "doc1") ? (@.value == "2")'
For big tables it's extremely expensive to join and aggregate all rows before finally filtering the few qualifying rows.
It should be faster by orders of magnitude to filter qualifying documents first, then grab all documents for the same user(s), aggregate, and finally join to users:
SELECT u.username, d.docs
FROM (
SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
FROM documents d1
JOIN documents d USING (user_id)
WHERE d1.name = 'doc1'
AND d1.value = 2
-- AND d.name IS NOT NULL -- strip NULLs early
-- AND d.value IS NOT NULL -- if not defined NOT NULL anyway
GROUP BY 1
) d
JOIN users u ON u.id = d.user_id;
While being at it, I removed jsonb_strip_nulls()
since all processed columns are defined NOT NULL anyway. Also cheaper.
Possibly simplify to just jsonb_build_object(d.name, d.value)
.
For the first step, an index on documents(name, value)
will help a lot. Maybe even on documents(name, value, user_id)
to get index-only scans (depends).
It should be safe to assume there is also an index on documents(user_id)
. Helps the next step. Again, documents(user_id, name, value)
for index-only scans.
And, finally, an index on users(id)
. Should be a given. Again, users(id, username)
for index-only scans.
If (name, value)
is not UNIQUE
per user (like it seems to be the case), use EXISTS
instead to avoid duplicates:
SELECT u.username, d.docs
FROM (
SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
FROM documents d
WHERE EXISTS (
SELECT FROM documents d1
WHERE d1.user_id = d.user_id
AND d1.name = 'doc1'
AND d1.value = 2
)
GROUP BY 1
) d
JOIN users u ON u.id = d.user_id;
Similar query plan, same indexes can be used.
Related: