For my own interests I am writing an ANSI SQL Lexer. Specifically, I am trying to conform to ISO/IEC 9075-2:2003(E). I ran into a problem in the token stage with some ambiguity.
The lexical elements section define an interval string as follows:
<interval string> ::= <quote> <unquoted interval string> <quote>
<unquoted interval string> ::= [ <sign> ] { <year-month literal> | <day-time literal> }
<year-month literal> ::= <years value> [ <minus sign> <months value> ] | <months value>
<years value> ::= <datetime value>
<months value> ::= <datetime value>
<datetime value> ::= <unsigned integer>
<unsigned integer> ::= <digit>...
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Example: '30'
Is the 30 a <years value> without the option or is it a <months value>?
In theory I could write: SELECT '30'
I created a YearsValue token and a MonthsValue token (classes). However, the ambiguity is an issue, it matches both. I don't see anything specifically dealing with multiple matches in part 1 or part 2 of ISO/IEC 9075.
Can someone point out where in the spec this is handled or is it just assumed left to right?
Before anyone asks, I am doing this because I want to write a SQL lexer. Its not for school its just something to educate myself. I don't want to use GOLD or ANTLR either.
Is the 30 a <years value> without the option or is it a <months value>?
Based on my reading of a draft of SQL 2003, it is left ambiguous in a way that doesn't matter. Yes, the grammar does not specify whether the 1
in INTERVAL '1' YEAR
is a <years value>
or a <months value>
, or even perhaps a <days value>
, but it really does not matter. The description of how YEAR
is interpreted is clear that 1
is a number of years, even if it is parsed as a <months value>
. The standard says that the first component in the value is mapped to the first field type in the interval type:
5.3 <literal>
General Rules
7) The i-th datetime component in a <datetime literal> or <interval literal> assigns the value of the datetime component to the i-th <primary datetime field> in the <datetime literal> or <interval literal>.