Search code examples
postgresqlexplainsql-execution-plan

In a PostgreSQL query plan, what is the difference between Materialize and Hash?


Here's an example of a query plan with Materialize:

Nested Loop  (cost=4.65..49.46 rows=33 width=488)
   Join Filter: (t1.hundred < t2.hundred)
   ->  ... (outer)
   ->  Materialize  (cost=0.29..8.51 rows=10 width=244)
         ->  ... (inner)

And here's an example of a query plan with Hash:

Hash Join  (cost=230.47..713.98 rows=101 width=488)
   Hash Cond: (t2.unique2 = t1.unique2)
   ->  ... (outer)
   ->  Hash  (cost=229.20..229.20 rows=101 width=244)
         ->  ... (inner)

In the second example the results of the inner query are loaded into a hash table and for each of the outer rows the Hash Join will do a hash table lookup.

In the first example the results of the inner are also loaded into memory (that's what Materialize means, isn't it?), and for each outer row the Nested Loop Join also has to do a lookup for the correct row in the materialized data. What kind of data structure does it use for that? (Clearly not a hash table.)


Solution

  • After consulting the source I see that Materialize is basically just a contiguous cache of rows (a tuplestore) that is constantly rewound and iterated again for each of the outer rows.