Search code examples
postgresqlpostgis

PostGIS minimum distance of two sets including other variables from both tables


I have two tables (table1 and table2) with three columns: id, value and geometry. The geometries are point features.

I want to do a join between both tables where the resulting table contains for each point of table1, the minimum distance to a point of table2, the value of table1 and the value of the corresponding point of table2.

I tried the following code, but logically, this gives for each poin of table1 the distance to each point of table2. However, I cannot leave v2 from the group by clause. How can I get the table I want?

SELECT t1.value AS v1,
    t2.value AS v2,
    MIN(st_distance(t1.eometry, t2.geometry)) AS dis                        
   FROM table1 t1, table2 t2
GROUP BY v1, v2

Solution

  • For some simplicity I simply took integer values and their differences instead of the distance between points (but it should be exactly the same: just change the subtraction against the st_distance function):

    demo:db<>fiddle

    SELECT DISTINCT ON (v1.point) 
         v1.point, 
         v2.point, 
         abs(v1.point - v2.point) 
    FROM
    table1 v1
    CROSS JOIN table2 v2
    ORDER BY v1.point, abs(v1.point - v2.point)
    

    My tables:

    table1.point: 1, 2, 4, 8, 16
    table2.point: 2, 3, 5, 7, 11, 13
    

    The result:

    | point | point | abs |
    |-------|-------|-----|
    |     1 |     2 |   1 |
    |     2 |     2 |   0 |
    |     4 |     3 |   1 |
    |     8 |     7 |   1 |
    |    16 |    13 |   3 |
    

    Explanation:

    You have to calculate all differences to know which one is the smallest. That's the reason for the CROSS JOIN. Now you can ORDER BY the points of table1 and the differences (or distances). Notice the abs() function: This makes all negative values positive. Otherwise difference -42 would be taken instead of +1.

    DISTINCT ON (v1.point) takes the first ordered row for each v1.point.

    Notice:

    Because of the CROSS JOIN and the heavy mathematics in st_distance it could be really slow for huge data sets!