Search code examples
pandassqlitepycharmanacondapg8000

pandas unable to write to Postgres db throws "KeyError: ("SELECT name FROM sqlite_master ..."


I have created a package allowing a user to write data to either a sqlite or Postgres db. I created a module for connecting to the db and a separate module that provides the writing functionality. In the latter module the write is a straightforward pandas internal function call:

indata.to_sql('pay_' + table, con, if_exists='append', index=False)

Writing to an sqlite db (with connection using 'sqlite3') is successful however when writing to a Postgres db I get the following error:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1778, in execute
    ps = cache['ps'][key]
KeyError: ("SELECT name FROM sqlite_master WHERE type='table' AND name=?;", ((705, 0, <function Connection.__init__.<locals>.text_out at 0x7fc3205fb510>),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1837, in execute
    self.handle_messages(cursor)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1976, in handle_messages
    raise self.error
pg8000.core.ProgrammingError: {'S': 'ERROR', 'V': 'ERROR', 'C': '42P01', 'M': 'relation "sqlite_master" does not exist', 'P': '18', 'F': 'parse_relation.c', 'L': '1180', 'R': 'parserOpenTable'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1610, in execute
    raise_with_traceback(ex)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/compat/__init__.py", line 46, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1837, in execute
    self.handle_messages(cursor)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1976, in handle_messages
    raise self.error
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': {'S': 'ERROR', 'V': 'ERROR', 'C': '42P01', 'M': 'relation "sqlite_master" does not exist', 'P': '18', 'F': 'parse_relation.c', 'L': '1180', 'R': 'parserOpenTable'}

I traced the error to the following file:

/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py

What seems to be happening is that the '.to_sql' function is configured to try to write to a db named 'sqlite_master' at this point in the 'sql.py' file:

    def has_table(self, name, schema=None):
    # TODO(wesm): unused?
    # escape = _get_valid_sqlite_name
    # esc_name = escape(name)

    wld = "?"
    query = (
        "SELECT name FROM sqlite_master " "WHERE type='table' AND name={wld};"
    ).format(wld=wld)

    return len(self.execute(query, [name]).fetchall()) > 0

Looking more closely at the errors you can see that the connection is correctly made to the db but that pandas is looking for an sqlite db:

enter image description here

I know that the db name was one I used several half a year ago when I first started working with sqlite so I'm thinking that somewhere I set a configuration value. So:

  1. is my reasoning correct?
  2. if so, how do I change the configuration?
  3. if not, what is possibly going on?

Solution

  • Per pandas.DataFrame.to_sql documentation:

    con : sqlalchemy.engine.Engine or sqlite3.Connection

    Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects.

    This means only SQLite allows a raw connection for the to_sql method. All other RDBMs including Postgres must use an SQLAlchemy connection for this method to create structures and append data. Do note: read_sql does not require SQLAlchemy since it does not make persistent changes.

    Therefore, this raw DB-API connection cannot work:

    import psycopg2
    con = psycopg2.connect(host="localhost", port=5432, dbname="mydb", user="myuser", password="mypwd")
    
    indata.to_sql('pay_' + table, con, if_exists='append', index=False)
    

    However, this SQLAlchemy connection can work:

    from sqlalchemy import create_engine    
    
    engine = create_engine('postgresql+psycopg2://myuser:mypwd@localhost:5432/mydb')
    
    indata.to_sql('pay_' + table, engine, if_exists='append', index=False)
    

    Better use SQLAlchemy for both databases, here for SQLite:

    engine = create_engine("sqlite:///path/to/mydb.db")