postgresql indexing query-optimization internals database-engine

How does PostgreSQL combine the use of multiple indexes via AND?

How does PostgreSQL combine the use of two btree indexes via the AND operator?

I find this page in the Postgres documentation to be highly surprising, to the point of implausibility:

https://www.postgresql.org/docs/current/indexes-bitmap-scans.html

This page seems to indicate that when a query contains a clause WHERE "col1" = 'A' AND "col2" = 'B', if both col1 and col2 are indexed, then a bitmap (apparently consisting of one bit for every row) is set up in memory for each of the two column indexes, and these are ANDed together to find the result set of matching rows.

This seems extremely implausible to me, particularly for btree indexes, because this would mean that query time and memory consumption would both scale linearly in the number of rows in the database, regardless of the number of rows matching a given query, which would make Postgres unusable for many applications.

Please someone tell me that Postgres queries that only sparsely match rows in a table do not scale linearly with the total number of rows in the database!

If two btree indexes index the same table, I would be much more likely to believe that the indexes are combined by somehow intersecting the btree nodes directly (however that would look!). Personally if I were to implement this, I would create a btree containing the unique values in the database, then at each leaf node for each unique value, I would store a heap of the id values that have the given value in the indexed column. Finding the common rows would then be quite easy, by intersecting the leaf node heap for each table.

I checked two different Postgres internals docs to see if I can see how this works, and I haven't yet found the answer I'm looking for...

Solution

You are kind of making up details the documentation intentionally doesn't provide. Bitmaps are only combined within the same table, and within a table only pages which have rows to contribute are present in the data structure. With the present implementation, each block which contributes any rows takes about 64 bytes. Since they are purely in memory structures, the implementation can change freely between versions. The docs usually don't go into details on such things, as keeping the docs up to date with changes in implementation is very annoying.