Search code examples
pythonmysqlsql-parser

extract all column names from an SQL query in Python using sqlparse


I am trying to extract all the column names from a given SQL query ( MySQL to start with). I am using sqlparser module in python.

I have a simple code to fetch the columns until the "from" keyword is reached. How do I get column names from rest of the query.

def parse_sql_columns(sql):
    columns = []
    parsed = sqlparse.parse(sql)
    stmt = parsed[0]
    for token in stmt.tokens:
        if isinstance(token, IdentifierList):
            for identifier in token.get_identifiers():
                columns.append(str(identifier))
        if isinstance(token, Identifier):
            columns.append(str(token))
        if token.ttype is Keyword:  # from
            break
    return columns

sample query:

string2 = "SELECT test, ru.iuserid AS passengerid, ru.vimgname FROM ratings_user_driver AS rate LEFT JOIN trips AS tr ON tr.itripid = rate.itripid LEFT JOIN register_user AS ru ON ru.iuserid = tr.iuserid WHERE tr.idriverid='5083' AND tr.iactive='Finished' AND tr.ehailtrip='No' AND rate.eusertype='Passenger' ORDER BY tr.itripid DESC LIMIT 0,10;"

expected output:

["test", "ru.iuserid AS passengerid", "ru.vimgname", "tr.itripid", "rate.itripid","ru.iuserid", "tr.iuserid","tr.idriverid", "tr.iactive", "tr.ehailtrip", "rate.eusertype", "tr.itripid"]

One other issue that I am facing is that "where clause is not treated correctly by this parser. It is not identified as a keyword and due to that, i'm unable to extract information from it clearly


Solution

  • You can use SQLGlot to do this.

    import sqlglot
    import sqlglot.expressions as exp
    
    for column in sqlglot.parse_one(sql).find_all(exp.Column):
        print(column.sql())