Search code examples
pythonpyparsing

How to get table names from SQL using select_parser.py from pyparsing?


When I take select_parser.py from pyparsing, and run it with pyparsing 2.2.0 using this code:

query="select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)"
for key, val in select_stmt.parseString(query, parseAll=True).items():
    print "%s: %s" % (key, val)

I get

$ python select_parser.pyparsing.py
where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
columns: [['z.a'], ['b']]

Although there is a named element "table" in the original definition:

single_source = ( (Group(database_name("database") + "." + table_name("table*")) | table_name("table*")) + 

there is no dict key that comes out with the name "table".

Perhaps the "from" element consumed things first? I don't understand the exact logic of how the named elements get populated, and I haven't gotten a clear idea from reading the docs (several talks, etc.).

How can I use select_parser.py to get all the table names in a SQL query?

Note: the right answer here is a list (or set): test_table, test2_table, foo.

I could go through "from" and ditch lists, but that seems hacky, I don't know if it would work, and it doesn't seem like how pyparsing is supposed to work.

I see this question, this one, and this one, but I don't understand how they help here.


Solution

  • The sample code you linked to includes a call to runTests. This is a great tool to try different test strings, and to write unit tests for your parser.

    When inserting your query string in the call to runTests, we get this output:

    select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)
    ['SELECT', [['z.a'], ['b']], 'FROM', ['test_table', ['LEFT', 'JOIN'], 'test2_table', []], 'WHERE', [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]]
    - columns: [['z.a'], ['b']]
      [0]:
        ['z.a']
      [1]:
        ['b']
    - from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
      [0]:
        ['test_table', ['LEFT', 'JOIN'], 'test2_table', []]
        - table: [['test_table'], ['test2_table']]
          [0]:
            ['test_table']
          [1]:
            ['test2_table']
    - where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
      [0]:
        ['1', '=', '1']
      [1]:
        AND
      [2]:
        ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]
        [0]:
          b
        [1]:
          IN
        [2]:
          ['SELECT', [['bb']], 'FROM', 'foo']
          - columns: [['bb']]
            [0]:
              ['bb']
          - from: ['foo']
          - table: [['foo']]
            [0]:
              ['foo']
    

    The 'table' names are there, but you'll have to do some navigation of the structure to get to them. Here is one way:

    result = select_stmt.parseString(query)
    table_names = []
    def visit_structure(struct):
        if 'table' in struct:
            table_names.extend(t[0] for t in struct.table)
        for substruct in struct:
            if isinstance(substruct, ParseResults):
                visit_structure(substruct)
    
    visit_structure(result)
    print(table_names)
    

    Gives:

    ['test_table', 'test2_table', 'foo']
    

    For future listing of your parsed data, make use of the ParserElement.runTests and ParseResults.dump methods.