In sql column names are given along with table they belong to i.e; in this way- emp.ename or just the column name ename. I'm writing a program to parse these kind of sql statements using PyParsing library. rc1 is the for without table name convention and rc2 for the other.
rc1=delimitedList(column_name("columns*") + Optional(Optional(AS) + column_alias("col_alias*"))) + (ZeroOrMore(delimitedList(rc2)))
rc2= OneOrMore(delimitedList(("tab") + "." + column_name("Source_Columns") + Optional(Optional(AS) + column_alias)))
result_column = "*" | OneOrMore(delimitedList(table_name("tab") + "." + column_name("Source_Columns") + Optional(Optional(AS) + column_alias("col_alias"))))| OneOrMore(rc1) |OneOrMore(rc2)|OneOrMore(rc1+rc2)|OneOrMore(rc2+rc1)
select_core = (SELECT + Optional(DISTINCT | ALL) + OneOrMore(result_column) +
Optional(FROM + join_source("from*"))
In result_column both are read. I gave all the combinations rc1+rc2, rc2+rc1.. Suppose my 1st input query is:
SELECT emp.ename as e FROM scott.employee as emp
Output is:
{'tab': 'emp', 'Source_Columns': 'ename', 'col_alias': 'e', 'table_alias': 'emp', 'from': '{database: scott table: employee }'}
2nd input:
SELECT ename as e, fname as f FROM scott.employee as emp
output:
{'columns': 'ename fname', 'col_alias': 'e f', 'table_alias': 'emp', 'from': '{database: scott table: employee }'}
3rd input: (Here I combine both conventions i.e' rc2+rc1)
SELECT emp.eid, fname,lname FROM scott.employee as emp
Output:
{'tab': 'emp', 'Source_Columns': 'eid'}
4th input: (Here I combine both conventions i.e' rc1+rc2)
SELECT ename, lname, emp.eid FROM scott.employee as emp
Output:
{'columns': 'ename lname emp'}
I hope you understand my problem from these I/O
The 3rd output should have been something like this: {'tab': 'emp', 'Source_Columns': 'eid', 'columns':'fname lname','from': '{database: scott table: employee }'}
and the 4th one: {'columns': 'ename lname ','tab': 'emp', 'Source_Columns': 'eid',from': '{database: scott table: employee }'}
What am I missing? (outputs can be seen using select_core.runTests(tests)
and input can be given as a string in tests
)
Here are your tests laid out for runTests:
tests = """\
SELECT emp.ename as e FROM scott.employee as emp
SELECT ename as e, fname as f FROM scott.employee as emp
#Here I combine both conventions i.e' rc2+rc1)
SELECT emp.eid, fname,lname FROM scott.employee as emp
#Here I combine both conventions i.e' rc1+rc2)
SELECT ename, lname, emp.eid FROM scott.employee as emp
"""
Using the parser from select_parser.py, runTests output is:
SELECT emp.ename as e FROM scott.employee as emp
['SELECT', [['emp.ename', 'AS', 'e']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
- columns: [['emp.ename', 'AS', 'e']]
[0]:
['emp.ename', 'AS', 'e']
- from: [['scott', '.', 'employee']]
[0]:
['scott', '.', 'employee']
- database: ['scott']
- table: [['employee']]
[0]:
['employee']
- table_alias: [['emp']]
[0]:
['emp']
SELECT ename as e, fname as f FROM scott.employee as emp
['SELECT', [['ename', 'AS', 'e'], ['fname', 'AS', 'f']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
- columns: [['ename', 'AS', 'e'], ['fname', 'AS', 'f']]
[0]:
['ename', 'AS', 'e']
[1]:
['fname', 'AS', 'f']
- from: [['scott', '.', 'employee']]
[0]:
['scott', '.', 'employee']
- database: ['scott']
- table: [['employee']]
[0]:
['employee']
- table_alias: [['emp']]
[0]:
['emp']
#Here I combine both conventions i.e' rc2+rc1)
SELECT emp.eid, fname,lname FROM scott.employee as emp
['SELECT', [['emp.eid'], ['fname'], ['lname']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
- columns: [['emp.eid'], ['fname'], ['lname']]
[0]:
['emp.eid']
[1]:
['fname']
[2]:
['lname']
- from: [['scott', '.', 'employee']]
[0]:
['scott', '.', 'employee']
- database: ['scott']
- table: [['employee']]
[0]:
['employee']
- table_alias: [['emp']]
[0]:
['emp']
#Here I combine both conventions i.e' rc1+rc2)
SELECT ename, lname, emp.eid FROM scott.employee as emp
['SELECT', [['ename'], ['lname'], ['emp.eid']], 'FROM', ['scott', '.', 'employee'], 'AS', 'emp']
- columns: [['ename'], ['lname'], ['emp.eid']]
[0]:
['ename']
[1]:
['lname']
[2]:
['emp.eid']
- from: [['scott', '.', 'employee']]
[0]:
['scott', '.', 'employee']
- database: ['scott']
- table: [['employee']]
[0]:
['employee']
- table_alias: [['emp']]
[0]:
['emp']
These shouldn't really be two different statements in pyparsing, you should be able to write just one what captures an optional leading table name, and give it an appropriate results name. Then make the column name a group so that name and table can be extracted from each one.