Search code examples
sqldatabasealgorithmparsingsql-parser

Do SQL parsers just interpret IN as a series of OR conditions?


When writing an SQL query such as

SELECT ID, NAME FROM USER_TABLE WHERE ID IN (1, 2, 10, 14, 15, ..., n)

does the parser just rephrase that into this?

SELECT ID, NAME FROM USER_TABLE WHERE ID = 1 
                                      OR ID =  2 
                                      OR ID =  10
                                      OR ID =  14
                                      OR ID =  15 
                                      ...
                                      OR ID =  n

Or does it do something else in the background for efficiency? While a nightmare to write out by hand and I would never advocate doing so, is there any theoretical performance benefit or hit to using IN rather than a series of OR conditions like that?


Solution

  • According to the specs (SQL92 for example), x IN (a, b, c) is described as:

    8.4  <in predicate>
    
    [...]
    
    4) The expression
         RVC IN IPV
       is equivalent to
         RVC = ANY IPV
    

    And:

    8.7  <quantified comparison predicate>
    
    [...]
    
    <quantified comparison predicate> ::=
         <row value constructor> <comp op> <quantifier> <table subquery>
    
    <quantifier> ::= <all> | <some>
    
    <all> ::= ALL
    
    <some> ::= SOME | ANY
    
    [...]
    
       c) If the implied <comparison predicate> is true for at least
         one row RT in T, then "R <comp op> <some> T" is true.
    

    The last line seems to suggest that x IN (a, b, c) is supposed to provide identical result as x = a OR x = b OR x = c. However the specs do not dictate how the RDBMs should implement the behavior, it could vary across RDBMs.

    This following posts contain some interesting observations:

    SQL Server seems to generate same execution plan for x IN (...) and x = ... OR x = ... where as MySQL handles them differently.