Search code examples
pythonsqlpyodbcimpala

Pyodbc can't parse query from iql file


I have a docker image with a script which reads an iql file and executes it in impala. Removing the file reference and running a simple script works fine (ie SELECT 1;), but whenever I try to execute the script from a file I get an error

pyodbc.Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : ParseException: Syntax error in line 2:\n\n^\nEncountered: EOF\nExpected: ALTER, COMMENT, COMPUTE, COPY, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH\n\nCAUSED BY: Exception: Syntax error\n (110) (SQLExecDirectW)'

I have tried removing DriverManagerEncoding=UTF-16 from the cloudera.impalaodbc.ini file and various types of sql query (including changing the file extension to isql, iusql) but always get the same error.

Any ideas?

Python function:

   def process_data_to_impala():
         conn = pyodbc.connect(DSN='some_DSN', autocommit=True)
         crsr = conn.cursor()
         with open('/path/to/sql/file.iql','r') as inserts:
             for statement in inserts:
                crsr.execute(statement) 
                print(crsr.fetchall())

Solution

  • The problem relates to the way the python code is looping through the statement and the way ';' ends the query.

    A better way to do it for a single query in a file is

       with open('file/location.iql') as script:
        statement = script.read()
        print(statement)
        crsr = conn.cursor()
        crsr.execute(statement)
    

    And when you have multiple queries in a file and you need to loop:

        with open('file/location.iql') as script:
        sqlscript = script.read()
        for statement in sqlscript.split(';'):
            print(statement)
            crsr = conn.cursor()
            crsr.execute(statement)
    

    When reading multiple queries from one script (ie Drop table, create table) you must remove ';' from the last query. This is because the split creates a final blank query after the last ';'. See this question for more info.

    I hope this helps somebody in the future!