I'm using Python to read values from SQL Server (pypyodbc) and insert them into PostgreSQL (psycopg2)
A value in the NAME field has come up that is causing errors:
Montaño
The value is existing in my MSSQL database just fine (SQL_Latin1_General_CP1_CI_AS encoding), and can be inserted into my PostgreSQL database just fine (UTF8) using PGAdmin and an insert statement.
The problem is selecting it using python causes the value to be converted to:
Monta\xf1o
(xf1 is ASCII for 'Latin small letter n with tilde')
...which is causing the following error to be thrown when trying to insert into PostgreSQL:
invalid byte sequence for encoding "UTF8": 0xf1 0x6f 0x20 0x20
Is there any way to avoid the conversion of the input string to the string that is causing the error above?
Under Python_2 you actually do want to perform a conversion from a basic string to a unicode
type. So, if your code looks something like
sql = """\
SELECT NAME FROM dbo.latin1test WHERE ID=1
"""
mssql_crsr.execute(sql)
row = mssql_crsr.fetchone()
name = row[0]
then you probably want to convert the basic latin1
string (retrieved from SQL Server) to the type unicode
before using it as a parameter to the PostgreSQL INSERT, i.e., instead of
name = row[0]
you would do
name = unicode(row[0], 'latin1')