I'm attempting to write an ANTLR4 grammar for lookml. The schema for this language is relatively straight forward but it has 2 wrinkles -- 1 is that it supports a templating language that can be used in most "properties" and 2 there are a few fields that allow for arbitrary sql expressions.
My issue has to do with how the lexer gets tokens in different contexts.
LookML has a property for case/when expressions for example:
dimension: query_type {
type: string
case: {
when: {
label: "SELECT Query"
sql: ${name} ILIKE 'SELECT%'
;;
}
else: 'Other'
}
}
so I have tokens in my lexer:
CASE: 'case';
WHEN: 'when';
ELSE: 'else';
But you can also have CASE/WHEN statements in the arbitrary SQL fields
dimension: full_name {
type: string
sql: case when true then 'its true' else 'its false' end ;;
}
My parser rule for the sql expression can't just "catch all" everything between the sql:
and the ;;
because there could be template variables that I want parsed. When the lexer runs it considers the CASE
WHEN
as the reserved keywords intended to match up with the case
when
properties. My sql
property rule then needs to account for CASE | WHEN | ELSE
and really any reserved keyword in lookml that could also find its way in arbitrary SQL code.
I've considered a few options:
sql
property parser rule and let the tokenizer think that those are tokens.sql:
and ;;
as one big token and handle parsing the possible template values in the application code'case:'
Are any of these common approaches to this problem? This is my first grammar from scratch so I could be missing the point entirely here. I also tried looking into modes but I can't tell if that is actually the right application here.
When parsing a language inside a language (SQL inside LookML), you could use lexical modes. When using lexical modes, you'll need to separate the lexer- and parser-grammars.
A quick demo:
lexer grammar LookMLLexer;
DIMENSION : 'dimension';
SQL : 'sql' SPACE* ':' -> pushMode(SqlMode);
CASE : 'case';
WHEN : 'when';
ELSE : 'else';
COL : ':';
OBRACE : '{';
CBRACE : '}';
STRING : '"' .*? '"';
ID : [a-zA-Z_] [a-zA-Z_0-9]*;
COMMENT : '#' ~[\r\n]* -> skip;
SPACES : SPACE+ -> skip;
OTHER : .;
fragment SPACE : [ \t\r\n];
mode SqlMode;
SCOL2 : ';;' -> popMode;
SELECT options { caseInsensitive = true; } : 'select';
FROM options { caseInsensitive = true; } : 'from';
SQL_ID : ID -> type(ID);
SQL_SPACES : SPACE+ -> skip;
parser grammar LookMLParser;
options {
tokenVocab=LookMLLexer;
}
parse
: dimension EOF
;
dimension
: DIMENSION ':' ID '{' case_when key_value* '}'
;
case_when
: CASE ':' '{' when+ else '}'
;
when
: WHEN ':' '{' sql key_value* '}'
;
else
: ELSE ':' value
;
key_value
: ID ':' value
;
value
: STRING
| ID
;
sql
: SQL sql_stat SCOL2
;
sql_stat
: SELECT ID FROM ID
;
will parse the input:
dimension: field_name {
case: {
when: {
sql: SELECT a FROM b ;;
label: "value"
}
# Possibly more when statements
else: "value"
}
alpha_sort: yes
}
as follows: