Search code examples
sqltypestype-conversionlanguage-lawyerdatabase-agnostic

ANSI sql for comparison with string and non-string


Let's say I have any one of the following expression:

SELECT
    DATE '2014-01-01' < '2014-02-01',
    DATE '2014-01-01' < '321',
    9 < '10',
    9 < 'a'

Is there a recommendation or requirement in the SQL standard of how these should be compared? I suppose the three levels or 'strictness' would be the following:

  1. Raise an error [most strict] -- all 4 expressions above would fail.
  2. Try casting the string to the non-string type, if it doesn't work raise an error -- expressions 2 and 4 above would fail.
  3. Try casting the string to the non-string type, if it fails fallback to casting the non-string to a string -- all 4 above work.
  4. Cast the non-string operand to a string -- all 4 above work.

It seems BigQuery uses the second approach, postgres uses something like 2/3 (only the last one fails), mysql uses either 3 or 4 (no fail).

Does the standard make any recommendations here?


Solution

  • Does the standard make any recommendations here?

    I believe you're asking if there are any official rules regarding automatic type conversion for comparison? The answer is no*.

    For comparison operation, the specs mention that data types [...] shall be comparable. Comparison of different data types is not described but it does say that implicit type conversion can occur in expressions [...]. So RDBMS is allowed to convert the data type of one or both operands to compare them.

    That being said, the rules for automatic type conversion are implementation-dependent. The rules vastly differ across RDBMS. Consult your RDBMS' documentation to understand them.

    SQL Server for example, uses data type precedence to convert the data type of one of the operands to match the other:

    • For DATE '2014-01-01' < '2014-02-01', the varchar value will be converted to date for comparison
    • For 9 < '10', the varchar value will be converted to int for comparison

    MySQL has a different set of rules:

    • For DATE '2014-01-01' < '2014-02-01', both values will be converted to timestamps (not timestamp data type) for comparison
    • For 9 < '10', both values will be converted to floating point numbers for comparison

    Unfortunately implicit conversions have too many gotchas and they should be avoided. For example 2.0 = '2.01' is true in SQL Server and 2 = '2foo' is true in MySQL.

    Use the CAST function, variables of correct type (DECLARE @userdate AS DATE = '20120201') or appropriate functions (STR_TO_DATE('01-02-2012','%d-%m-%Y')) on strings.


    * The answer is based on SQL-92 standard — very old but still relevant to the question.