I am trying to parse and read the components of a complex select statement based on the select_parser.py
example, but I cannot seem to make it work for a select statement with a UNION.
My sample select statement is
select x as foo from xyz where x='1' union select y from abc except select z from pqr
This statement gets correctly parsed, and when I extract the dict from the parse results, I would expect some way to distinguish between the 3 select statements here. However the dict is overwritten internally and returns only the last select statement.
parseResults.asDict()
on the above statement gives me
{'columns': [{'col': {'col': ['z']}}],
'from': [{'table': [['xyz']]}, {'table': [['abc']]}, {'table': [['pqr']]}],
'where_expr': [{'col': ['x']}, '=', '1']}
Which seems jumbled. I would ideally expect the result to look something like this
{
'select_core': {
'columns': [
{
'col': {
'col': [
'x'
]
}
}
],
'from': [
{
'table': [
[
'xyz'
]
]
}
],
'where_expr': [
{
'col': [
'x'
]
},
'=',
'1'
]
},
compund_operation: [
{
operation: 'UNION'
'columns': [
{
'col': {
'col': [
'y'
]
}
}
],
'from': [
{
'table': [
[
'abc'
]
]
}
],
'where_expr': []
},
{
operation: 'EXCEPT'
'columns': [
{
'col': {
'col': [
'z'
]
}
}
],
'from': [
{
'table': [
[
'pqr'
]
]
}
],
'where_expr': []
}
]
}
Any help on how I should label the grammar so that I can get a result like the above?
I think this has something to do with how the statements are labeled. I have tried labeling the select statements, but I am unable to correctly label the + ZeroOrMore(compound_operator + select_core)
statement to give me the nested dict I desire.
Nice work on writing this extension. Wrap the select_core
in a Group to keep the results names from overwriting each other. I'm guessing that this expression in full is something like select_core + ZeroOrMore(compound_operator + select_core)
. You can rewrite as:
expr = Group(select_core) + ZeroOrMore(compound_operator + Group(select_core))
or just:
select_core = Group(select_core)
expr = select_core + ZeroOrMore(compound_operator + select_core)
Either way, wrapping the select_core in a Group will make it have its own names.
EDIT: Took another look at select_parser.py, and I think these changes will do the proper grouping for you (and I'll include these changes in the next release, they look like a good idea anyway).
select_core = Group( # <-- add this Group
SELECT
+ Optional(DISTINCT | ALL)
+ Group(delimitedList(result_column))("columns")
+ Optional(FROM + join_source("from*"))
+ Optional(WHERE + expr("where_expr"))
+ Optional(
GROUP
+ BY
+ Group(delimitedList(ordering_term))("group_by_terms")
+ Optional(HAVING + expr("having_expr"))
)
)
select_stmt << (
Group( # <-- add Group
select_core + ZeroOrMore(compound_operator + select_core)
)("select_terms") # <-- results name
+ Optional(ORDER + BY + Group(delimitedList(ordering_term))("order_by_terms"))
+ Optional(
LIMIT
+ (Group(expr + OFFSET + expr) | Group(expr + COMMA + expr) | expr)("limit")
)
)