Search code examples
pyparsing

Pyparsing SQL Selects: Unable to extract complex UNIONS as a Dict


I am trying to parse and read the components of a complex select statement based on the select_parser.py example, but I cannot seem to make it work for a select statement with a UNION.

My sample select statement is select x as foo from xyz where x='1' union select y from abc except select z from pqr

This statement gets correctly parsed, and when I extract the dict from the parse results, I would expect some way to distinguish between the 3 select statements here. However the dict is overwritten internally and returns only the last select statement.

parseResults.asDict() on the above statement gives me

{'columns': [{'col': {'col': ['z']}}], 
'from': [{'table': [['xyz']]}, {'table': [['abc']]}, {'table': [['pqr']]}], 
'where_expr': [{'col': ['x']}, '=', '1']}

Which seems jumbled. I would ideally expect the result to look something like this

{
  'select_core': {
    'columns': [
      {
        'col': {
          'col': [
            'x'
          ]
        }
      }
    ],
    'from': [
      {
        'table': [
          [
            'xyz'
          ]
        ]
      }
    ],
    'where_expr': [
      {
        'col': [
          'x'
        ]
      },
      '=',
      '1'
    ]
  },
  compund_operation: [
    {
      operation: 'UNION'
      'columns': [
        {
          'col': {
            'col': [
              'y'
            ]
          }
        }
      ],
      'from': [
        {
          'table': [
            [
              'abc'
            ]
          ]
        }
      ],
      'where_expr': []
    },
    {
      operation: 'EXCEPT'
      'columns': [
        {
          'col': {
            'col': [
              'z'
            ]
          }
        }
      ],
      'from': [
        {
          'table': [
            [
              'pqr'
            ]
          ]
        }
      ],
      'where_expr': []
    }
  ]
}

Any help on how I should label the grammar so that I can get a result like the above?

I think this has something to do with how the statements are labeled. I have tried labeling the select statements, but I am unable to correctly label the + ZeroOrMore(compound_operator + select_core) statement to give me the nested dict I desire.


Solution

  • Nice work on writing this extension. Wrap the select_core in a Group to keep the results names from overwriting each other. I'm guessing that this expression in full is something like select_core + ZeroOrMore(compound_operator + select_core). You can rewrite as:

    expr = Group(select_core) + ZeroOrMore(compound_operator + Group(select_core))
    

    or just:

    select_core = Group(select_core)
    expr = select_core + ZeroOrMore(compound_operator + select_core)
    

    Either way, wrapping the select_core in a Group will make it have its own names.

    EDIT: Took another look at select_parser.py, and I think these changes will do the proper grouping for you (and I'll include these changes in the next release, they look like a good idea anyway).

    select_core = Group(  # <-- add this Group
        SELECT
        + Optional(DISTINCT | ALL)
        + Group(delimitedList(result_column))("columns")
        + Optional(FROM + join_source("from*"))
        + Optional(WHERE + expr("where_expr"))
        + Optional(
            GROUP
            + BY
            + Group(delimitedList(ordering_term))("group_by_terms")
            + Optional(HAVING + expr("having_expr"))
        )
    )
    
    select_stmt << (
        Group(  # <-- add Group
            select_core + ZeroOrMore(compound_operator + select_core)
        )("select_terms")  # <-- results name
        + Optional(ORDER + BY + Group(delimitedList(ordering_term))("order_by_terms"))
        + Optional(
            LIMIT
            + (Group(expr + OFFSET + expr) | Group(expr + COMMA + expr) | expr)("limit")
        )
    )