Search code examples
pythonpostgresqlpandascsvodo

using odo to load CSV -> postgres on AWS


I'm trying to do something fairly simple, but either odo is broken or I don't understand how datashapes work in the context of this package.

The CSV file:

email,dob
tony@gmail.com,1982-07-13
blah@haha.com,1997-01-01
...

The code:

from odo import odo
import pandas as pd

df = pd.read_csv("...")
connection_str = "postgresql+psycopg2:// ... "

t = odo('path/to/data.csv', connection_str, dshape='var * {email: string, dob: datetime}')

The error:

AssertionError: datashape must be Record type, got 0 * {email: string, dob: datetime}

It's the same error if I try to go directly from a DataFrame -> Postgres as well:

t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')

A few other things that don't fix the problem: 1) removing the header line from the CSV file, 2) changing var to the actual number of rows in the DataFrame.

What am I doing wrong here?


Solution

  • Does connection_str have a table name? That fixed it for me when I ran into a similar issue but with a sqlite database.

    Should be something like:

    connection_str = "postgresql+psycopg2://your_database_name::data"
    t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')
    

    where 'data' in 'connection_str' is your new table name.

    See also:

    python odo sql AssertionError: datashape must be Record type, got 0 * {...}

    https://github.com/blaze/odo/issues/580