Search code examples
postgresqlcharacter-encodinggispostgisgdal

Encoding problems with ogr2ogr and Postgis/PostgreSQL database


In our organization, we handle GIS content in different file formats. I need to put these files into a PostGIS database, and that is done using ogr2ogr. The problem is, that the database is UTF8 encoded, and the files might have a different encoding.

I found descriptions of how I can specify the encoding by adding an options parameter to org2ogr, but appearantly it doesn't have an effect.

ogr2ogr -f PostgreSQL PG:"host=localhost user=username dbname=dbname \
password=password options='-c client_encoding=latin1'" sourcefile;

The error I recieve is:

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "målsætning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe56c73
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "påvirkning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe57669
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: INSERT command for new feature failed.
ERROR: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

Currently, my source file is a Shape file and I'm pretty sure, that it is Latin1 encoded.

What am I doing wrong here and can you help me?

Kind regards, Casper


Solution

  • That does sound like it would set the client encoding to LATIN1. Exactly what error do you get?

    Just in case ogr2ogr doesn't pass it along properly, you can also try setting the environment variable PGCLIENTENCODING to latin1.

    I suggest you double check that they are actually LATIN1. Simply running file on it will give you a good idea, assuming it's actually consistent within the file. You can also try sending it through iconv to convert it to either LATIN1 or UTF8.