I know some basics of BTree data structure, but I want to know how actual data is referenced in pgsql. Let's take boolean
for example, if we only have two values in a simple BTree (true
/false
) - it is obvious, we would just have two nodes (actually one node with two values, but let's talk super simple), but in pgsql we are also storing references to the actual data. How would such an index look like in pgsql structurally?
Would it be like two nodes (for true/false) with some list of references to the actual rows, or would it be like head with false
, and two linked lists attached to it (false for left and true for right or something like that)?
This question might be very stupid, but I could not find anything useful on google.
A B-tree index has one entry per table row. So if your table has 100 million rows, the index will have 100 million entries. This is mitigated to some degree by index de-duplication, introduced in v13.
It may be that a GIN index would be smaller (requires extension btree_gin
), but it would be more expensive to modify. The GIN index would only have two entries, but each would have a large posting tree.
Very often, an index on a boolean
column is not useful.