Search code examples
postgresqlutf-8odbcdatabase-connectionstata

Loading Data from PostgreSQL into Stata


When I load data from PostgreSQL into Stata some of the data has unexpected characters appended. How can I avoid this?

Here is the Stata code I am using:

odbc query mydatabase, schema $odbc 
odbc load, exec("SELECT * FROM my_table") $odbc allstring

Here is an example of the output I see:

198734/0     one/0/r      April/0/0/0
893476/0     two/0/r      May/0/0/0
324192/0     three/0/r    June/0/0/0

In Postgres the data is:

198734     one      April
893476     two      May
324192     three    June

I see this in mostly in larger tables and with fields of all datatypes in PostgreSQL. If I export the data to a csv there are no trailing characters.

The odbci.ini file I am using looks like this:

[ODBC Data Sources]
mydatabase = PostgreSQL

[mydatabase]
Debug = 1
CommLog = 1
ReadOnly = no
Driver = /usr/lib64/psqlodbcw.so
Servername = myserver
Servertype = postgres
FetchBufferSize = 99
Port = 5432
Database = mydatabase

[Default]
Driver = /usr/lib64/psqlodbcw.so

I am using odbc version unixODBC 2.3.1 and PostgreSQL version 9.4.9 with server encoding UTF8 and Stata version 14.1.

What is causing the unexpected characters in the data imported into Stata? I know that I can clean the data once it’s in Stata but I would like to avoid this.


Solution

  • I was able to fix this by adding the line

    set odbcdriver ansi
    

    to the Stata code.