Search code examples
pythonsqlalchemyorm

How to resolve a many-to-one-relation on insert in sqlalchemy 2.0?


I have a large number of data, that from different acquisition systems. This is the ORM defined with SQLalchemy:

class DataTable(Base):
    __tablename__ = "data_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    system_id: Mapped[int] = mapped_column(ForeignKey("system.id"))
    system: Mapped["System"] = relationship()
    data_feature: Mapped[int] = mapped_column(String(32))


class System(Base):
    __tablename__ = "system"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(16))
    system_feature: Mapped[str] = mapped_column(String(16))

Systems are already there on the DB. Now I want to put a large number of data (~200k) on the DB. Each data row comes with the system name. I want to resolve those names during DB-upload into the corresponding IDs. At the moment I resolve the system names using a pandas.merge on the client side. I remove the system name columns from the myData pandas.DataFrame and renaming the id column (stemming from the systems table)

system_table = session.execute(select(System.id, System.name)).all()
system_table_df = pandas.DataFrame(system_table)
new_data = myData.merge(system_table_df, left_on='system_name', 
right_on='name').drop(['system_name', 'name'], axis=1)
new_data.rename(columns={'id': 'system_id'}, inplace=True)

Then I can insert the data into the DB.

session.execute(insert(DataTable).values(myData.to_dict(orient='records'))
session.commit()

Is there a way to perform the resolving of the system id on the DB?

I tried with insert().returning(), which brought me quite close, but did not exactly do the job. I do not know how to catch the system_name attribute from myData and use it in the where-clause.

session.execute(
insert(DataTable).values(
    system_id=select(System.id).where(System.name == "???")
).returning(DataTable), myData.to_dict(orient="records"))

Another approach was to use session.add() like in How to create a foreignkey reference with sqlalchemy. Didn't work out either.


Solution

  • Pass a subquery that performs the lookup to the insert's .values() method

    subq = (
        sa.select(System.id).where(System.name == sa.bindparam('system')).scalar_subquery()
    )
    
    with Session.begin() as s:
        insert = sa.insert(DataTable).values(system_id=subq)
        s.execute(insert, dts)
    

    This will emit SQL like this:

    INSERT INTO data_table (system_id, data_feature) VALUES ((SELECT system.id 
    FROM system 
    WHERE system.name = ?), ?)