Search code examples
sqlpostgresqlcross-joinredundancy

reducing redundant product from SQL cross join


I was stepping through a common SQL exercise and ran into an annoyance. Question is whether you have a better solve than mine below. The initial problem was "find all the coordinate sets on a plain that have the shortest overall distance." That's straightforward. However, when you select all those rows that have the shortest distance, you get (for example):

x y x2 y2 dist globalmin
-2 0 -1 -2 2.23606797749979 2.23606797749979
-2 0 0 1 2.23606797749979 2.23606797749979
-1 -2 -2 0 2.23606797749979 2.23606797749979
0 1 -2 0 2.23606797749979 2.23606797749979

You see the problem -- I'm pulling redundant rows because an earlier cross join has mirrored the x and y the values.

The solution I found was to to distinct a least, greatest swap like so:

select
distinct
least(t1.x,t1.x2) x1, greatest(t1.x,t1.x2) x2, least(t1.y,t1.y2) y1, greatest(t1.y,t1.y2) y2
,dist,globalmin
from dists2 t1
where t1.dist=t1.globalmin
;

Result:

x1 x2 y1 y2 dist globalmin
-2 -1 -2 0 2.23606797749979 2.23606797749979
-2 0 0 1 2.23606797749979 2.23606797749979

...so that approach will fix the redundancy issue, but it feels inelegant, like there's a classic maneuver I'm missing. I welcome any advice on a better approach.

Thanks!


Solution

  • When you have 2 distinct points A and B, your query says:

    • Give me the distance from the top/left point to the bottom/right point (i.e. the distance from e.g. A to B).
    • Also give me the distance from the bottom/right point to the top/left point (i.e. the opposite of the above = distance from B to A).

    Armed with that knowledge, all we have to do is create an order between points. One case that would work:

    • If A is left of B (A.x < B.x), then A < B.
    • If A is placed on a vertical line that goes through B (A.x = B.x) and is above B (A.y < B.y), then A < B.
      Note: not that it matters but for A to be said to be above B, the y axis must be oriented down.
    • In every other case, A >= B.

    You can show that it is indeed an order relation (to be precise, a lexicographic order), although the way I described it makes it a strict order (no points is "before" itself).
    Also, the set of points is not just partially ordered: every pair of points can be compared. It is said to be totally ordered.

    Now that we have a way to order every point against every other point, we can work with a non-equi join.


    Quick example: the query below works on 4 points placed on the 4 corners of a square.

    • The point of coordinates (0,0) is ordered before the other 3 points, so it is joined against them all.
    • The points of coordinates (0,1) is ordered before the 2 points on its right but not before (0,0), as the latter is placed vertically and above the former.
    • The point of coordinates (1,0) is only ordered before (1,1), which is placed vertically to it, but under it.
    WITH Points(x, y) AS (
    VALUES (0,0), (1,0), (0,1), (1,1)
    )
    SELECT p1.x as x1, p1.y as y1, p2.x as x2, p2.y as y2
    FROM Points p1
    JOIN Points p2 ON p1.x < p2.x OR (p1.x = p2.x AND p1.y < p2.y)
    ORDER BY x1, y1, x2, y2
    

    From there, it is easy to turn a point_table p1 CROSS JOIN point_table p2 into the proper point_table p1 INNER JOIN point_table p2 ON ... on the model shown above.
    However, contrary to your question title, you seem to have a single dists2 table/view.

    • If it is a view, change its definition by replacing the CROSS JOIN by an INNER JOIN just like I said.
    • If it is a table, the condition we have seen in the ON clause must be moved to a WHERE clause.
    SELECT x1, x2, y1, y2 ,dist,globalmin
    FROM dists2
    WHERE x1 < x2 OR (x1 = x2 AND y1 < y2)
    AND dist=globalmin