Search code examples
sqlpostgresqlpostgis

Is it faster updating based on matching strings or spatial join in postgresql?


I work with PostGIS and quite large data of around 1.5 billion rows, I want to know which update is faster based on matching strings of one column with another column compared to spatial join (ST_Intersects)?

The query I mean is:

UPDATE table_grid AS gd
SET col_grid = col_adm
FROM table_adm de
WHERE ST_Intersects(de.geom, gd.geom);

versus:

UPDATE table_grid AS gd
SET col_grid = col_adm
FROM table_adm de
WHERE gd.col_match1 = de.col_match1
AND gd.col_match2 = de.col_match2
AND gd.col_match3 = de.col_match3;

Please give reasons why one query is faster than the other


Solution

  • As already underlined by the comments, without additional details that's a very abstract question. Based on your previous thread, I'll assume that:

    1. You aim for the two alternative updates to be functionally equivalent: find the same matches for the same rows, transfer the same values.
    2. In both scenarios, you have an adequate index in place.
    3. You're matching geometry shapes of an even grid with much larger, irregular (multi)polygons: enter image description here

    In that case, the three-string match will be typically (much) faster. You can adjust the demo from the old thread and see the reason for yourself: demo

    1. The join on 3 text values is clear, precise and specific - Postgres knows the matched row based on the index alone. Text-to-text comparison in the Index Cond is fast and there's no Filter stage required.
    2. In case of ST_Intersects(), the index only helps the first stage where it narrows down match candidates using a less precise bounding box intersection operator &&, but after that it has to perform a fairly expensive, vertex-by-vertex intersection check for all candidates on the Filter stage. && descending down the R-tree is quick, but the final comparison has to do some planar/spherical geometry math, which will always be more expensive than a (hash) equality check it'd do if it just compared strings.

    On 380 admin areas and 20k grid cells, 3-text match takes 50ms, st_intersects() takes 300ms.

    I'm speeding up both updates by creating the table_grid with fillfactor=50 - that way all pages of the table have 50% empty space so that when update comes, the new version of the tuple can be saved right behind the old version.

    Although in the demo I made these functionally equivalent by establishing the 3-text match based on the st_intersects() in the first place, keep in mind the latter is undeterministic. As established in the old thread, without additional match conditions, whenever a grid cell intersects more than one admin area, Postgres will match it unpredictably. A deterministic version of that intersection-based match will be even slower.