postgresql sql-execution-plan sql-tuning postgresql-9.6 sql-timestamp

PostgreSQL 9.6 selects a wrong plan during aggregation against timestamp columns

I have a simple but pretty big table “log” that has three columns: user_id, day, hours.

user_id character varying(36) COLLATE pg_catalog."default" NOT NULL,
day timestamp without time zone,
hours double precision

All columns have indexes.

The issue is that aggregation against the 'day' field works extremely slow. For example the simple query takes an eternity to complete.

select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'

Analysis shows that Postgres makes a full scan filtering the entries that doesn’t relate to user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' what is absolutely counter-intuitive

[
  {
    "Execution Time": 146502.05,
    "Planning Time": 0.893,
    "Plan": {
      "Startup Cost": 789.02,
      "Actual Rows": 1,
      "Plans": [
        {
          "Startup Cost": 0.44,
          "Actual Rows": 1,
          "Plans": [
            {
              "Index Cond": "(log.day IS NOT NULL)",
              "Startup Cost": 0.44,
              "Scan Direction": "Forward",
              "Plan Width": 8,
              "Rows Removed by Index Recheck": 0,
              "Actual Rows": 1,
              "Node Type": "Index Scan",
              "Total Cost": 1395792.54,
              "Plan Rows": 1770,
              "Relation Name": "log",
              "Alias": "log",
              "Parallel Aware": false,
              "Actual Total Time": 146502.015,
              "Output": [
                "log.day"
              ],
              "Parent Relationship": "Outer",
              "Actual Startup Time": 146502.015,
              "Schema": "public",
              "Filter": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
              "Actual Loops": 1,
              "Rows Removed by Filter": 12665610,
              "Index Name": "index_log_day"
            }
          ],
          "Node Type": "Limit",
          "Plan Rows": 1,
          "Parallel Aware": false,
          "Actual Total Time": 146502.016,
          "Output": [
            "log.day"
          ],
          "Parent Relationship": "InitPlan",
          "Actual Startup Time": 146502.016,
          "Plan Width": 8,
          "Subplan Name": "InitPlan 1 (returns $0)",
          "Actual Loops": 1,
          "Total Cost": 789.02
        }
      ],
      "Node Type": "Result",
      "Plan Rows": 1,
      "Parallel Aware": false,
      "Actual Total Time": 146502.019,
      "Output": [
        "$0"
      ],
      "Actual Startup Time": 146502.019,
      "Plan Width": 8,
      "Actual Loops": 1,
      "Total Cost": 789.03
    },
    "Triggers": []
  }
]

And what is more strange the almost similar query works perfectly.

select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'

Postgres selects entries for user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' first and then aggregates among them what is obviously correct.

[
  {
    "Execution Time": 5.989,
    "Planning Time": 1.186,
    "Plan": {
      "Partial Mode": "Simple",
      "Startup Cost": 6842.66,
      "Actual Rows": 1,
      "Plans": [
        {
          "Startup Cost": 66.28,
          "Plan Width": 8,
          "Rows Removed by Index Recheck": 0,
          "Actual Rows": 745,
          "Plans": [
            {
              "Startup Cost": 0,
              "Plan Width": 0,
              "Actual Rows": 745,
              "Node Type": "Bitmap Index Scan",
              "Index Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
              "Plan Rows": 1770,
              "Parallel Aware": false,
              "Actual Total Time": 0.25,
              "Parent Relationship": "Outer",
              "Actual Startup Time": 0.25,
              "Total Cost": 65.84,
              "Actual Loops": 1,
              "Index Name": "index_log_user_id"
            }
          ],
          "Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
          "Exact Heap Blocks": 742,
          "Node Type": "Bitmap Heap Scan",
          "Plan Rows": 1770,
          "Relation Name": "log",
          "Alias": "log",
          "Parallel Aware": false,
          "Actual Total Time": 5.793,
          "Output": [
            "day",
            "hours",
            "user_id"
          ],
          "Lossy Heap Blocks": 0,
          "Parent Relationship": "Outer",
          "Actual Startup Time": 0.357,
          "Total Cost": 6838.23,
          "Actual Loops": 1,
          "Schema": "public"
        }
      ],
      "Node Type": "Aggregate",
      "Strategy": "Plain",
      "Plan Rows": 1,
      "Parallel Aware": false,
      "Actual Total Time": 5.946,
      "Output": [
        "min(hours)"
      ],
      "Actual Startup Time": 5.946,
      "Plan Width": 8,
      "Actual Loops": 1,
      "Total Cost": 6842.67
    },
    "Triggers": []
  }
]

There are two possible work arounds:

1) rewrite query into:

select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id

2) Introduce pair indexes like it was suggested in finding MAX(db_timestamp) query

They could look good but I consider both ways exactly as work arounds (first one is even a hack). Logically if Postgres can select a proper plan for ‘hours’ it have to do it for ‘day’ but it doesn’t. So it looks like a Postgres bug that is occurred during aggregate of timestamp fields, however I admit that I can miss something. Could somebody please advise if something can be done here without using WAs or it’s really a Postgres bug and I have to report it?

UPD: I've reported this as a bug to the PostgreSQL bugs mailing list. I'll let everybody know if it's accepted.

Solution

I've got a reply from PostgreSQL. They do not consider it as a bug. There are possible WAs in this case, many of them are mentioned in the original post and later in the comments. My personal selection is first option mentioned originally because it doesn't require indexes manipulation (it's far not always possible). So the solution is to rewrite query into:

select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id