Search code examples
pythonparsingpyparsing

Python: Two parse statements not working together


In sql column names are given along with table they belong to i.e; in this way- emp.ename or just the column name ename. I'm writing a program to parse these kind of sql statements using PyParsing library. rc1 is the for without table name convention and rc2 for the other.

rc1=delimitedList(column_name("columns*") + Optional(Optional(AS) + column_alias("col_alias*"))) + (ZeroOrMore(delimitedList(rc2)))

rc2= OneOrMore(delimitedList(("tab") + "." + column_name("Source_Columns") + Optional(Optional(AS) + column_alias)))

result_column = "*" | OneOrMore(delimitedList(table_name("tab") + "." + column_name("Source_Columns") + Optional(Optional(AS) + column_alias("col_alias"))))| OneOrMore(rc1) |OneOrMore(rc2)|OneOrMore(rc1+rc2)|OneOrMore(rc2+rc1)

select_core = (SELECT + Optional(DISTINCT | ALL) + OneOrMore(result_column) +
                Optional(FROM + join_source("from*")) 

In result_column both are read. I gave all the combinations rc1+rc2, rc2+rc1.. Suppose my 1st input query is:

SELECT emp.ename as e FROM scott.employee as emp

Output is:

{'tab': 'emp', 'Source_Columns': 'ename', 'col_alias': 'e', 'table_alias': 'emp', 'from': '{database:  scott   table:   employee  }'}

2nd input:

SELECT ename as e, fname as f FROM scott.employee as emp

output:

{'columns': 'ename    fname', 'col_alias': 'e    f', 'table_alias': 'emp', 'from': '{database:  scott   table:   employee  }'}

3rd input: (Here I combine both conventions i.e' rc2+rc1)

SELECT emp.eid, fname,lname FROM scott.employee as emp

Output:

{'tab': 'emp', 'Source_Columns': 'eid'}

4th input: (Here I combine both conventions i.e' rc1+rc2)

SELECT ename, lname, emp.eid FROM scott.employee as emp

Output:

{'columns': 'ename    lname    emp'}

I hope you understand my problem from these I/O The 3rd output should have been something like this: {'tab': 'emp', 'Source_Columns': 'eid', 'columns':'fname lname','from': '{database: scott table: employee }'} and the 4th one: {'columns': 'ename lname ','tab': 'emp', 'Source_Columns': 'eid',from': '{database: scott table: employee }'}

What am I missing? (outputs can be seen using select_core.runTests(tests) and input can be given as a string in tests )


Solution

  • Here are your tests laid out for runTests:

    tests = """\
    SELECT emp.ename as e FROM scott.employee as emp
    SELECT ename as e, fname as f FROM scott.employee as emp
    #Here I combine both conventions i.e' rc2+rc1)
    SELECT emp.eid, fname,lname FROM scott.employee as emp
    #Here I combine both conventions i.e' rc1+rc2)
    SELECT ename, lname, emp.eid FROM scott.employee as emp
    """
    

    Using the parser from select_parser.py, runTests output is:

    SELECT emp.ename as e FROM scott.employee as emp
    ['SELECT', [['emp.ename', 'AS', 'e']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
    - columns: [['emp.ename', 'AS', 'e']]
      [0]:
        ['emp.ename', 'AS', 'e']
    - from: [['scott', '.', 'employee']]
      [0]:
        ['scott', '.', 'employee']
        - database: ['scott']
        - table: [['employee']]
          [0]:
            ['employee']
    - table_alias: [['emp']]
      [0]:
        ['emp']
    
    
    SELECT ename as e, fname as f FROM scott.employee as emp
    ['SELECT', [['ename', 'AS', 'e'], ['fname', 'AS', 'f']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
    - columns: [['ename', 'AS', 'e'], ['fname', 'AS', 'f']]
      [0]:
        ['ename', 'AS', 'e']
      [1]:
        ['fname', 'AS', 'f']
    - from: [['scott', '.', 'employee']]
      [0]:
        ['scott', '.', 'employee']
        - database: ['scott']
        - table: [['employee']]
          [0]:
            ['employee']
    - table_alias: [['emp']]
      [0]:
        ['emp']
    
    #Here I combine both conventions i.e' rc2+rc1)
    SELECT emp.eid, fname,lname FROM scott.employee as emp
    ['SELECT', [['emp.eid'], ['fname'], ['lname']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
    - columns: [['emp.eid'], ['fname'], ['lname']]
      [0]:
        ['emp.eid']
      [1]:
        ['fname']
      [2]:
        ['lname']
    - from: [['scott', '.', 'employee']]
      [0]:
        ['scott', '.', 'employee']
        - database: ['scott']
        - table: [['employee']]
          [0]:
            ['employee']
    - table_alias: [['emp']]
      [0]:
        ['emp']
    
    #Here I combine both conventions i.e' rc1+rc2)
    SELECT ename, lname, emp.eid FROM scott.employee as emp
    ['SELECT', [['ename'], ['lname'], ['emp.eid']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
    - columns: [['ename'], ['lname'], ['emp.eid']]
      [0]:
        ['ename']
      [1]:
        ['lname']
      [2]:
        ['emp.eid']
    - from: [['scott', '.', 'employee']]
      [0]:
        ['scott', '.', 'employee']
        - database: ['scott']
        - table: [['employee']]
          [0]:
            ['employee']
    - table_alias: [['emp']]
      [0]:
        ['emp']
    

    These shouldn't really be two different statements in pyparsing, you should be able to write just one what captures an optional leading table name, and give it an appropriate results name. Then make the column name a group so that name and table can be extracted from each one.