Search code examples
python-2.7asciipsycopg2pypyodbc

How to avoid conversion to ASCII when reading


I'm using Python to read values from SQL Server (pypyodbc) and insert them into PostgreSQL (psycopg2)

A value in the NAME field has come up that is causing errors:

Montaño

The value is existing in my MSSQL database just fine (SQL_Latin1_General_CP1_CI_AS encoding), and can be inserted into my PostgreSQL database just fine (UTF8) using PGAdmin and an insert statement.

The problem is selecting it using python causes the value to be converted to:

Monta\xf1o 

(xf1 is ASCII for 'Latin small letter n with tilde')

...which is causing the following error to be thrown when trying to insert into PostgreSQL:

invalid byte sequence for encoding "UTF8": 0xf1 0x6f 0x20 0x20

Is there any way to avoid the conversion of the input string to the string that is causing the error above?


Solution

  • Under Python_2 you actually do want to perform a conversion from a basic string to a unicode type. So, if your code looks something like

    sql = """\
    SELECT NAME FROM dbo.latin1test WHERE ID=1
    """
    mssql_crsr.execute(sql)
    row = mssql_crsr.fetchone()
    name = row[0]
    

    then you probably want to convert the basic latin1 string (retrieved from SQL Server) to the type unicode before using it as a parameter to the PostgreSQL INSERT, i.e., instead of

    name = row[0]
    

    you would do

    name = unicode(row[0], 'latin1')