Search code examples
pythonencodingubuntu-14.04pyodbcstring-decoding

String encoding issue with pyodbc's returned data


On making query with PyODBC, I am getting my table data as:

 u'\u3836\u3431\u3132\u3230\u3030'

The actual content in my database column is as:

 6814210200

When I explicitly encode pyodbc's returned value to utf-16, I get the content as (closest I went):

>>> print d['data'][0]['upc'].encode('utf-16')
 ��6814210200
#^^ two junks

My question is: How can I get the encoded value directly from the PyODBC query?

I already have CHARSET=UTF16 set in my database connection string as:

pyodbc.connect("DRIVER=<driver_name>;" + \
                                 "SERVER=<server_ip>;" +\
                                 "DATABASE=<database>;" +\
                                 "UID=<user>;" +\
                                 "PWD=<password>;" + \
                                 "CHARSET=UTF16",    # setting charset
                                 ansi=True)

Also in all my odbc.ini and odbcinst.ini file, I have set:

 UnicodeTranslationOption = utf16 
 CharacterTranslationOption = all

under my driver's setting.


Solution

  • You need to specify that you want the little endian version of UTF-16.

    s = u'\u3836\u3431\u3132\u3230\u3030'
    print s.encode('utf-16le')
    

    output

    6814210200
    

    FWIW, in Python 3, s.encode('utf-16le') returns b'6814210200'.