Search code examples
pythonsqlpandasdb2ibm-cloud

Python Pandas Read SQL from IBM DB2 with non ASCII characters


I am using Pandas Read Sql in Python for executing sql-statements in IBM DB2 and saving the response in pandas dataframes. Now I am trying to execute a sql-statement containing a non ASCII character, a letter from the Swedish alphabet: 'Å' (others are Å, Ä and Ö) but I am getting this error:

"DatabaseError: Execution failed on sql 'SELECT * FROM DATA_CONFIG WHERE TAG_NAME='Å'': ibm_db_dbi::Error: Error occure during processing of statement"

This is my code:

import ibm_db
import ibm_db_dbi
import pandas as pd

def sqlToFrame(sql): # Function for creating pandas dataframes from SQL-statements
    con = ibm_db.connect(connection_string, "", "")
    conn = ibm_db_dbi.Connection(con)
    return pd.read_sql(sql, conn)

df = sqlToFrame("SELECT * FROM DATA_CONFIG WHERE TAG_NAME='Å'")

I've tried executing the statement in the IDE in the IBM DB2 interface which works perfectly fine. Therefore I am figuring that the problem might be connected to how I establish the connection/the DB2 driver pandas uses. I have tried finding a way to set an encoding but can't find anything. How can I solve this? I also know this is possible because another package that builds upon ibm_db accept these characters. The characters are a from the ISO-8859-1 series


Solution

  • Consider parameterizing your query using the params argument of pandas.read_sql and pass the accented character 'Å' with u'' prefix to bind value to the unquoted ? placeholder in SQL query. Do note: params requires a sequence and so below passes a tuple of one item.

    Unlike Python 2.x, all strings in Python 3.x are Unicode strings and so accented literals (non-ascii) do not need explicit decoding with u'...'. Hence, why I cannot reproduce your issue in my Python 3.5 running a DB2 SQL query with accented characters.

    import pandas as pd
    ...
    
    # Function for creating pandas dataframes from SQL-statements
    def sqlToFrame(sql): 
        db = ibm_db.connect(connection_string, "", "")
        con = ibm_db_dbi.Connection(db)
    
        return pd.read_sql(sql, con, params = (u'Å',))
    
    
    df = sqlToFrame("SELECT * FROM DATA_CONFIG WHERE TAG_NAME = ?")