python sqlalchemy db2 ibm-midrange pyodbc

pyODBC + unixodbc + Db2 for iSeries = UnicodeDecodeError, illegal UTF-16 surrogate

I am able to successfully use pyODBC with SQLAlchemy in my docker container to connect to DB2 for iSeries (version 7.2). It works, but intermittently I'll run a query and get back the following traceback:

>>> Groups.query.get(group_id)

Traceback (most recent call last):
...
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 154, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
    cursor.execute(statement, parameters)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 64-65: illegal UTF-16 surrogate

Sometimes this will happen many times in a row and then suddenly stop. It does not always happen with the same set of queries, and I have tried two nearly-identical DB2 servers and gotten the same results.

The 'position 64-65' is always the same for the same query (even though sometimes the query returns correct results).

Versions:

pyodbc: 4.0.26
unixodbc: 2.3.4-1 (debian)
SQLAlchemy: 1.3.5
iAccess driver: ibm-iaccess-1.1.0.11-1.0

Solution

Finally found it.

Somewhere along the stack, there is a limit of 30 characters for column name aliases. My guess at this is pyodbc (db2 supports 128-length column names and aliases) and I've raised an issue on GitHub to track this problem.

When the 30-char limit is surpassed, pyodbc still attempts to decode a string with the original length of the column name, therefore it is attempting to decode garbage data, sometimes resulting in a UnicodeDecodeError (and all other times returning garbage data.

This is specific to column names (so cursor.keys() will show the garbage column names).

My workaround is to force SQLAlchemy to truncate column aliases by using a custom dialect.

customdb2.py:

from ibm_db_sa.pyodbc import AS400Dialect_pyodbc

class CustomAS400Dialect(AS400Dialect_pyodbc):
    max_identifier_length = 30
registry.register('db2.pyodbc400_custom', 'customdb2', 'CustomAS400Dialect')