postgresql query-optimization sql-execution-plan postgresql-performance

Terrible performance joining a view to a fast subquery / CTE in PostgreSQL

I have a PostgreSQL (v15) DB view that rolls up a bunch of data per user for a single organization, to display a report of data like fees owed/paid per user, etc. This is executed with an org ID and date range as inputs, and performs sub-second, which is perfectly fast for this use case (a UI report). So far so good.

For some orgs, I also need to produce a summary of THOSE summaries, i.e. a rollup for groups of organizations that summarizes the same data per user, per organization, for a parent organization. High-level, I am trying to select the target orgs first (which should produce a set of < 150 rows), then join my existing single-org view to that set to add in the aggregated data from the original view, but per collected organization.

While the real query deals with a lot more columns and aggregations in the output, this reduced version summarizes the core query logic:

WITH member_orgs AS (
  -- CTE returns 133 rows in ~30ms
  SELECT
    bp.id AS billing_period_id,
    bp.started_on AS billing_period_started_on,
    bp.ended_on AS billing_period_ended_on,
    org.name AS organization_name,
    bp.organization_id
  FROM billing_periods bp
    JOIN organizations org
      ON org.id = bp.organization_id
    WHERE
      bp.paid_by_organization_id = 123
      AND (
        bp.started_on >= '2023-07-01'
          AND bp.ended_on <= '2024-06-30'
        )
      AND bp.organization_id != 123
)
SELECT
  member_orgs.billing_period_id,
  member_orgs.billing_period_started_on,
  member_orgs.billing_period_ended_on,
  member_orgs.organization_name,
  -- this is one example aggregation, the real query has more of these:
  SUM(CASE WHEN details.received_amount > 0 THEN 1 ELSE 0 END) AS payments_received_count
  FROM member_orgs 
      LEFT JOIN per_athlete_fee_details_view details
        -- SLOW (~40 SECONDS):
          -- ON details.billing_period_id = member_orgs.billing_period_id
          --   AND details.organization_id = member_orgs.organization_id
        -- FAST (~150ms):
          ON details.billing_period_id = 1234
            AND details.organization_id = 3456
  GROUP BY
    member_orgs.billing_period_id,
    member_orgs.billing_period_started_on,
    member_orgs.billing_period_ended_on,
    member_orgs.organization_name;

The view being joined is fairly complex, and relies on some sub-views as well, but when executed in isolation it's very fast. The member_orgs CTE is also very fast on its own (~30ms), and always results in < 150 records. As shown above, if I join the two on specific IDs (as a test), the overall query is extremely fast (~150ms). However, when joining on the columns between the CTE and the view (what I need to do), overall performance tanks to 40+ seconds.

I feel like I must be missing something silly, as I don't understand how joining the view to a set of 133 records (in the real case I'm debugging) could explode the time so dramatically. My understanding was that the CTE would materialize its output, allowing the outer join to work on just that result set, which I would think to be very efficient. I could write application code to run the CTE, then iterate over the IDs and execute the outer query individually 133 times in far less time than this query is taking.

Please pardon the huge query plans, as the real queries (with underlying views) are pretty complicated, but these were created with a slightly more complex version of the reduced query example shown above (though the logic of it is the same). The only difference between the two runs was using specific IDs, vs joining on columns, exactly as shown in the example code above.

Fast version (by specific IDs) - executed in 44ms
Slow version (join on ID columns) - executed in 41+ seconds

Thanks in advance, and let me know if I can provide any additional details.

Solution

We ended up converting these views into a series of materialized CTEs, and inverting some of the dependencies between them to reduce duplication. Originally the base DB views were intended to be reusable in a way that ultimately became more of a hinderance than benefit, and so for the sake of performance, we ended up with a (large) block of raw SQL that we're executing directly from the Rails code. Not ideal for maintenance purposes, but we're now getting sub-second SQL execution for all use cases.