Search code examples
sqlpostgresqlpostgresql-13

How can I improve this query without HAVING Clause


There's two tables:

  1. users
  2. documents

users:

CREATE TABLE IF NOT EXISTS users (
    id         SERIAL PRIMARY KEY,
    username   TEXT NOT NULL UNIQUE
)

documents:

CREATE TABLE IF NOT EXISTS documents (
    id         SERIAL PRIMARY KEY,
    user_id    INT NOT NULL REFERENCES users,
    name       TEXT NOT NULL,
    value      INT NOT NULL 
)

I want to filter users by document name and value. Usually, between 2-5 document name and value is filtered. And there are apprx 6-10 documents per user.

I have a huge database and want to improve this query. I think I can get faster query without the HAVING clause. Any help highly appreciated. I use PostgreSQL 13.

The query I am using:

SELECT
    users.username,
    jsonb_agg(jsonb_strip_nulls(jsonb_build_object('name', documents.name, 'value', documents.value))) AS docs
FROM
    users
JOIN
    documents
ON
    users.id = documents.user_id
GROUP BY
    users.username
HAVING
    jsonb_agg(jsonb_build_object('name', documents.name, 'value', documents.value)) @? '$[*] ? (@.name == "doc1") ? (@.value == "2")'

Solution

  • For big tables it's extremely expensive to join and aggregate all rows before finally filtering the few qualifying rows.

    It should be faster by orders of magnitude to filter qualifying documents first, then grab all documents for the same user(s), aggregate, and finally join to users:

    SELECT u.username, d.docs
    FROM  (
       SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
       FROM   documents d1
       JOIN   documents d USING (user_id)
       WHERE  d1.name = 'doc1'
       AND    d1.value = 2
       -- AND    d.name  IS NOT NULL  -- strip NULLs early 
       -- AND    d.value IS NOT NULL  -- if not defined NOT NULL anyway
       GROUP  BY 1
       ) d
    JOIN   users u ON u.id = d.user_id;
    

    While being at it, I removed jsonb_strip_nulls() since all processed columns are defined NOT NULL anyway. Also cheaper.

    Possibly simplify to just jsonb_build_object(d.name, d.value).

    For the first step, an index on documents(name, value) will help a lot. Maybe even on documents(name, value, user_id) to get index-only scans (depends).

    It should be safe to assume there is also an index on documents(user_id). Helps the next step. Again, documents(user_id, name, value) for index-only scans.

    And, finally, an index on users(id). Should be a given. Again, users(id, username) for index-only scans.

    If (name, value) is not UNIQUE per user (like it seems to be the case), use EXISTS instead to avoid duplicates:

    SELECT u.username, d.docs
    FROM  (
       SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
       FROM   documents d
       WHERE  EXISTS (
          SELECT FROM documents d1
          WHERE  d1.user_id = d.user_id
          AND    d1.name = 'doc1'
          AND    d1.value = 2
          )
       GROUP  BY 1
       ) d
    JOIN   users u ON u.id = d.user_id;
    

    Similar query plan, same indexes can be used.

    Related: