Search code examples
pythonpostgresqlsqlalchemydatabase-migrationgoogle-cloud-sql

Using SQLAlchemy to migrate databases (sqlite to Postgres CloudSQL)


Trying to migrate the database for an early project of mine to the cloud. Although the code that built everything is janky, the db structure and data itself are fairly sound. I could probably figure out a dump method of migrating everything (pgdump, etc), but I still have much to learn about this stuff so I'd rather get the experience by doing it step-by-step.

Source: A ~1gb sqlite database file

Destination: Google CloudSQL running Postgres v9.6

Already created the tables in the cloud db, using the same schema & table names as the sqlite db. Not worried about schema enforcement errors because I haven't yet defined foreign keys in the cloud.

The Plan: Create distinct, concurrent SQLAlchemy connections to each database, then read sqlite --> write to CloudSQL. Went back and defined data structures for each table using SQLAlchemy. Snippet from models.py:

from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base, declared_attr

Base = declarative_base()

class PublicMixin(object):
    @declared_attr
    def __tablename__(cls):
        return cls.__name__.lower()

class Category(PublicMixin, Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Player(PublicMixin, Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)
    username = Column(String)
    notes = Column(String)
[...]

I then duplicated this file as models_lite.py so I could import each model without potential for interference. Here's the migration.py file I tried to run as a proof of concept:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from models import Base, Category, Player
from models_sqlite import Base as Base_lite, Category as Category_lite, Player as Player_lite

# SQLite db
engine_lite = create_engine('sqlite:///mydb.sqlite')
Base_lite.metadata.bind = engine_lite
LiteSession = sessionmaker()
LiteSession.bind = engine_lite
session_lite = LiteSession()

# CloudSQL, via local proxy
engine_cloud = create_engine('postgresql+psycopg2://USER:PW@/DBNAME?host=/cloudsql/INSTANCE')
Base.metadata.bind = engine_cloud
CloudSession = sessionmaker()
CloudSession.bind = engine_cloud
session_cloud = CloudSession()

category_lite = session_lite.query(Category_lite).all()
category_cloud = Category()

for c in category_lite:
    category_cloud = c
    session_cloud.add(category_cloud)

session_cloud.commit()

Running this yields the following error:

File "postgres migration.py", line 68, in <module>
    session_cloud.add(category_cloud)
[...]
sqlalchemy.exc.InvalidRequestError: Object '<Category at 0x11141b908>' is already attached to session '1' (this is '2')

Explicitly setting each column inside the for loop works (i.e.: category_cloud.id = c.id), but there must be a way to avoid having to do this for every column in every table. How should I approach this?


Solution

  • This kind of data transfer operation is much more easily achieved with sqlalchemy core rather than the orm. There is no benefit to mapping database data to objects here if they are only going to be immediately written to another database, it just adds complexity and slows things down. The following code will iterate through every table in Base, select all the columns in the sqlite database and write them one table at a time to the cloud database.

    from sqlalchemy import create_engine, select
    from models import Base
    
    engine_lite = create_engine('sqlite:///mydb.sqlite')
    engine_cloud = create_engine('postgresql+psycopg2://USER:PW@/DBNAME?host=/cloudsql/INSTANCE')
    
    with engine_lite.connect() as conn_lite:
        with engine_cloud.connect() as conn_cloud:
            for table in Base.metadata.sorted_tables:
                data = [dict(row) for row in conn_lite.execute(select(table.c))]
                conn_cloud.execute(table.insert().values(data))