Search code examples
mysqlsqlpostgresqlsubquerylexicographic

order definition for comparisons in SQL ROW subqueries?


I was wondering when a row subquery is performed with a comparison operator such as > or >=, is the order of comparison defined using lexicographic (i.e. dictionary) order or is it defined element-wise?

That is, for ROW (A, B), should

(79, 48) > (75, 52) 

be TRUE (dictionary) or FALSE (element-wise) in a row query in the WHERE clause?

I tested this in PostgreSQL, and it seems that it's using the dictionary order, i.e. (79, 48) > (75, 52) is TRUE because 79 > 75 and the second component doesn't matter therefore. Searching around, it seems that this is also the case with MySQL: MySQL row subquery comparison issue, and the MySQL documentation seems to be confusing on this point. Searching for postgresql row subquery doesn't show much about the comparison order.

While the dictionary order makes sense from a computer science perspective, it may look a bit weird to a database user because the order of rows now depends on which column you list first in the SQL. For example, using dictionary order, we should have:

 (52, 75) > (48, 79)  

for ROW (B,A). The same rows are compared, the order is exactly the opposite because column B is listed first.

My question is:

Is this behavior (use of dictionary order in row queries) from SQL standards/cross-vendor or is it implementation specific? Any references for this?


Solution

  • This is documented in the chapter Row Constructor Comparison of the Postgres manual:

    For the <, <=, > and >= cases, the row elements are compared left-to-right, stopping as soon as an unequal or null pair of elements is found. If either of this pair of elements is null, the result of the row comparison is unknown (null); otherwise comparison of this pair of elements determines the result. For example, ROW(1,2,NULL) < ROW(1,3,0) yields true, not null, because the third pair of elements are not considered.

    And:

    Note: Prior to PostgreSQL 8.2, the <, <=, > and >= cases were not handled per SQL specification. A comparison like ROW(a,b) < ROW(c,d) was implemented as a < c AND b < d whereas the correct behavior is equivalent to a < c OR (a = c AND b < d).

    Which clarifies that the behavior of modern Postgres is according to the SQL standard.

    It's basically the same logic as in the ORDER BY clause of a SELECT query: items are compared left to right until the first inequality is found - except for NULL values, which sort last in default ascending order.

    Related: