Search code examples
greenplum

Why does GPDB not consider the number of segments in cost estimation


No matter what version of the GPDB open source code, the number of segments is not considered in the cost evaluation, and only a simple process is done when returning the explain result to QD and make the result more clear.


Solution

  • Actually, Greenplum uses the number of segments during planning. Stats info is stored in master, Greenplum will use this global info to deduce the stats locally in each segment.

    Some examples are listed below:

    1. Function adjust_reloptinfo (big context please refer to the PR: https://github.com/greenplum-db/gpdb/pull/10676)
    2. Function estimate_num_groups_on_segment is to estimate the local distinct values given the cluster size and the global number of distincts
    3. Cost model of the Motion takes into account the cluster size: cdbpath_cost_motion
    4. ...