Search code examples
postgresqlstatisticshistogramsql-execution-plan

Row estimation for JOIN queries


How does PostgreSQL estimate the number of rows in JOIN query like:

EXPLAIN 
SELECT * 
FROM R, S 
WHERE (R.StartTime < S.EndTime) AND (S.StartTime < R.EndTime);

Solution

  • There is a chapter in the manual addressing your question exactly:

    With explanation for what Laurenz provided, among other things.

    But that wasn't the full story, yet. We also need the number of rows (cardinalities) of underlying tables. Postgres uses estimate_rel_size() defined in src/backend/utils/adt/plancat.c:

     /*
      * estimate_rel_size - estimate # pages and # tuples in a table or index
      *
      * We also estimate the fraction of the pages that are marked all-visible in
      * the visibility map, for use in estimation of index-only scans.
      *
      * If attr_widths isn't NULL, it points to the zero-index entry of the
      * relation's attr_widths[] cache; we fill this in if we have need to compute
      * the attribute widths for estimation purposes.
      */
     void
     estimate_rel_size(Relation rel, int32 *attr_widths,
                       BlockNumber *pages, double *tuples, double *allvisfrac)
     ...
    

    Here is a minimal SQL query to reproduce the calculation (ignoring some corner cases):

    SELECT (reltuples / relpages * (pg_relation_size(oid) / 8192))::bigint
    FROM   pg_class
    WHERE  oid = 'mytable'::regclass;  -- your table here
    

    More details:

    Example

    CREATE TEMP TABLE r(id serial, start_time timestamptz, end_time timestamptz);
    CREATE TEMP TABLE s(id serial, start_time timestamptz, end_time timestamptz);
    
    INSERT INTO r(start_time, end_time)
    SELECT now(), now()  -- actual values don't matter for this particular case
    FROM generate_series (1, 5000);
    
    INSERT INTO s(start_time, end_time)
    SELECT now(), now()
    FROM generate_series (1, 10000);
    
    VACUUM r, s;  -- set reltuples & relpages in pg_class
    
    -- add 2000 rows to S
    INSERT INTO s(start_time, end_time)
    SELECT now(), now()
    FROM generate_series (1, 2000);
    

    pg_class still has 5000 and 10000 reltuples, but we know there are 5000 & 12000 rows in R and S. (Since these are temporary tables, they are not covered by autovacuum, so numbers are never updated automatically.) Check:

    SELECT relname, reltuples, relpages  -- 5000 | 10000
    FROM   pg_class c
    WHERE  c.oid IN ('pg_temp.r'::regclass, 'pg_temp.s'::regclass);
    
    SELECT count(*) FROM r; -- 5000
    SELECT count(*) FROM s; -- 12000
    

    Query plan:

    EXPLAIN
    SELECT *
    FROM r, s
    WHERE (r.start_time < s.end_time) AND (s.start_time < r.end_time);
    
    'Nested Loop  (cost=0.00..1053004.31 rows=6683889 width=40)'
    '  Join Filter: ((r.start_time < s.end_time) AND (s.start_time < r.end_time))'
    '  ->  Seq Scan on s  (cost=0.00..197.31 rows=12031 width=20)'
    '  ->  Materialize  (cost=0.00..107.00 rows=5000 width=20)'
    '        ->  Seq Scan on r  (cost=0.00..82.00 rows=5000 width=20)'
    'JIT:'
    '  Functions: 6'
    '  Options: Inlining true, Optimization true, Expressions true, Deforming true'
    

    Postgres estimates rows=12031 for table s. A pretty good estimate, the algorithm worked.
    The estimate is more easily thrown off by deleting rows, as the physical size of the table doesn't shrink automatically. It's a good idea to VACUUM ANALYZE after a major DELETE. Or even VACUUM FULL ANALYZE. See:

    Postgres expects rows=6683889, which matches our expectation (as per Laurenz' explanation):

    SELECT 5000 * 12031 * 0.3333333333333333^2  -- 6683888.89
    

    Better query

    Your example query is just that: an example. But it happens to be a poor one, as the same can be achieved with range types and operators more efficiently. Specifically with tstzrange and &&:

    Selectivity for &&?

    SELECT oprjoin  -- areajoinsel
    FROM pg_operator
    WHERE oprname = '&&'
    AND oprleft = 'anyrange'::regtype
    AND oprright = 'anyrange'::regtype;
    

    The source code in `src/backend/utils/adt/geoselfuncs.c:

     Datum
     areajoinsel(PG_FUNCTION_ARGS)
     {
         PG_RETURN_FLOAT8(0.005);
     }
    

    Much more selective 0.005 << 0.333! And typically more realistic.

    EXPLAIN
    SELECT *
    FROM r, s
    WHERE tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time);
    

    Happens to be exactly equivalent, since tstzrange defaults to including the lower bound and excluding the upper bound. I get this query plan:

    'Nested Loop  (cost=0.00..1203391.81 rows=300775 width=40)'
    '  Join Filter: (tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time))'
    '  ->  Seq Scan on s  (cost=0.00..197.31 rows=12031 width=20)'
    '  ->  Materialize  (cost=0.00..107.00 rows=5000 width=20)'
    '        ->  Seq Scan on r  (cost=0.00..82.00 rows=5000 width=20)'
    'JIT:'
    '  Functions: 6'
    '  Options: Inlining true, Optimization true, Expressions true, Deforming true'
    

    Our expectation:

    SELECT 5000 * 12031 * 0.005  -- 300775.000
    

    It's a Bingo!
    And this query can be supported with an index efficiently, changing the game ...