Search code examples
pythonpostgresqlsqlalchemypython-unicode

SQLAlchemy and Postgres UnicodeDecodeError


The problem I am facing is same as posted here SQLAlchemy and UnicodeDecodeError. When I fetch results from a table I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The two solutions proposed are using sy.setdefaultencoding('utf8') in python and passing additional charset argument to the connection string as follows:

conn_str='postgresql+psycopg2://'+str(dbhost)+':' + str(port) + '/postgres?charset=utf8'

Both solution seem not to fix the problem. What else can I debug? The table is essentially GIS data for various countries. So the tables have special character.


Solution

  • It seems that encoding is different from server to client. You can verify this issuing these commands:

    SHOW client_encoding; --Equivalent to: SELECT current_setting('client_encoding');
    SHOW server_encoding; --Equivalent to: SELECT current_setting('server_encoding');
    

    PostgreSQL automatic converts to client encoding. Probably both are different in your environment. You can configure client_encoding by many ways:

    • Using SET command when open connection in you app: SET client_encoding = 'UTF-8';
    • Using set_config function when open connection in you app: SELECT set_config('client_encoding', 'UTF-8', true);
    • Configure PGCLIENTENCODINGenvironment var in you OS: export PGCLIENTENCODING=UTF8
    • Edit client_encoding in postgres config file
    • Use ALTER SYSTEM (you must refresh config after that with SELECT pg_reload_conf();): ALTER SYSTEM SET client_encoding = 'UTF-8';

    Update: Unfortunately it's not possible to enable automatic conversion from SQL_ASCII.

    If the client character set is defined as SQL_ASCII, encoding conversion is disabled, regardless of the server's character set. Just as for the server, use of SQL_ASCII is unwise unless you are working with all-ASCII data.

    Quote from Postgres documentation.