Search code examples
postgresqlquery-optimization

What causes postgresql to avoid an index on a join when it uses it if the data is in the where clause?


I have an index of this sort:

id, start_time, end_time

I have a very large table with information I need to retrieve from it. I have a secondary table with:

id, min_time, max_time

That basically corresponds to time ranges I need to retrieve from the large table. The id in both cases matches up.

When I do a join with the two tables, postgresql decides that it is a fantastic idea to do a table scan, then in a nested loop, join with the small table. So if, for example, I had 3 records in the small table, it will not utilize the index to filter it down to those 3 ids, rather it does things the hard way.

I'm actually constructing the small table in python prior to the query, so I can alternatively just hard code the conditions in the where clause, or at the very least, put the ids in the where clause and do the join. If I do this, it actually uses the index as expected.

My question is, why does Postgres not utilize the index on the join when it clearly can in this case? This isn't a case of it needing to run analyze, that does not help. It's an order of magnitude performance difference. It needs to start with the small table, then join the larger table, but I can't get it to do that even selecting from the small table first.

EDIT before submitting:

I figured it out, the small table definition was missing NOT NULL on the column definitions. If I added this, it works as expected.

So - leaving this out here for anyone else who may come across this very frustrating problem.


Solution

  • I figured it out, the small table definition was missing NOT NULL on the column definitions. If I added this, it works as expected. Seems like a bug in the optimizer tbh. All columns being joined on allowed NULLs.