Search code examples
pythonsql-serverpandassqlalchemytemp-tables

Use temp table with SQLAlchemy


I am trying to use use a temp table with SQLAlchemy and join it against an existing table. This is what I have so far

engine = db.get_engine(db.app, 'MY_DATABASE')
df = pd.DataFrame({"id": [1, 2, 3], "value": [100, 200, 300], "date": [date.today(), date.today(), date.today()]})
temp_table = db.Table('#temp_table',
                      db.Column('id', db.Integer),
                      db.Column('value', db.Integer),
                      db.Column('date', db.DateTime))
temp_table.create(engine)
df.to_sql(name='tempdb.dbo.#temp_table',
          con=engine,
          if_exists='append',
          index=False)
query = db.session.query(ExistingTable.id).join(temp_table, temp_table.c.id == ExistingTable.id)
out_df = pd.read_sql(query.statement, engine)
temp_table.drop(engine)
return out_df.to_dict('records')

This doesn't return any results because the insert statements that to_sql does don't get run (I think this is because they are run using sp_prepexec, but I'm not entirely sure about that).

I then tried just writing out the SQL statement (CREATE TABLE #temp_table..., INSERT INTO #temp_table..., SELECT [id] FROM...) and then running pd.read_sql(query, engine). I get the error message

This result object does not return rows. It has been closed automatically.

I guess this is because the statement does more than just SELECT?

How can I fix this issue (either solution would work, although the first would be preferable as it avoids hard-coded SQL). To be clear, I can't modify the schema in the existing database—it's a vendor database.


Solution

  • In case the number of records to be inserted in the temporary table is small/moderate, one possibility would be to use a literal subquery or a values CTE instead of creating temporary table.

    # MODEL
    class ExistingTable(Base):
        __tablename__ = 'existing_table'
        id = sa.Column(sa.Integer, primary_key=True)
        name = sa.Column(sa.String)
        # ...
    

    Assume also following data is to be inserted into temp table:

    # This data retrieved from another database and used for filtering
    rows = [
        (1, 100, datetime.date(2017, 1, 1)),
        (3, 300, datetime.date(2017, 3, 1)),
        (5, 500, datetime.date(2017, 5, 1)),
    ]
    

    Create a CTE or a sub-query containing that data:

    stmts = [
        # @NOTE: optimization to reduce the size of the statement:
        # make type cast only for first row, for other rows DB engine will infer
        sa.select([
            sa.cast(sa.literal(i), sa.Integer).label("id"),
            sa.cast(sa.literal(v), sa.Integer).label("value"),
            sa.cast(sa.literal(d), sa.DateTime).label("date"),
        ]) if idx == 0 else
        sa.select([sa.literal(i), sa.literal(v), sa.literal(d)])  # no type cast
    
        for idx, (i, v, d) in enumerate(rows)
    ]
    subquery = sa.union_all(*stmts)
    
    # Choose one option below.
    # I personally prefer B because one could reuse the CTE multiple times in the same query
    # subquery = subquery.alias("temp_table")  # option A
    subquery = subquery.cte(name="temp_table")  # option B
    

    Create final query with the required joins and filters:

    query = (
        session
        .query(ExistingTable.id)
        .join(subquery, subquery.c.id == ExistingTable.id)
        # .filter(subquery.c.date >= XXX_DATE)
    )
    
    # TEMP: Test result output
    for res in query:
        print(res)    
    

    Finally, get pandas data frame:

    out_df = pd.read_sql(query.statement, engine)
    result = out_df.to_dict('records')