Search code examples
pythonpandasdataframesqlalchemyorm

How to load a pandas dataframe from ORM SqlAlchemy from an existing database?


I want to load an entire database table into a Pandas DataFrame using SqlAlchemy ORM. I have successfully queried the number of rows in the table like this:

from local_modules import RemoteConnector
from sqlalchemy import Integer, Column
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.automap import automap_base
import pandas as pd

Base = automap_base()


class Calculations(Base):
    __tablename__ = "calculations"
    id = Column("ID", Integer, primary_key=True)


Base.prepare()

connection = RemoteConnector('server', 'calculations_database')
connection.connect()

Session = sessionmaker(bind=connection.engine)
session = Session()

result = session.query(Calculations).count()
print('Record count:', result)

Output:

Record count: 13915

Process finished with exit code 0

If possible and, it seems it can be done, I want to define the table using automap_base from sqlalchemy.ext.automap and not have to manually state each column. I did so with 'id' because I had an error that asked me to set a primary key (is there a better way to do this?).

In order to get any results I've been able to do the following:

results = session.query(Calculations).all()

Output:

[<__main__.Calculations object at 0x000001AF2324F510>, <__main__.Calculations object at 0x000001AF2324F6D0>, <__main__.Calculations object at 0x000001AF2324F810>, <__main__.Calculations object at 0x000001AF2324F910>, <__main__.Calculations object at 0x000001AF2324FA50>, <__main__.Calculations object at 0x000001AF2324FB90>, <__main__.Calculations object at 0x000001AF2324FCD0>, <__main__.Calculations object at 0x000001AF2324FE10>, <__main__.Calculations object at 0x000001AF2324FF50>, <__main__.Calculations object at 0x000001AF22CD40D0>, <__main__.Calculations object at 0x000001AF22CD4210>, <__main__.Calculations object at 0x000001AF22CD4350>, <__main__.Calculations object at 0x000001AF22CD4490>, <__main__.Calculations object at 0x000001AF22CD45D0>, <__main__.Calculations object at 0x000001AF22CD4710>, <__main__.Calculations object at 0x000001AF22CD4850>, <__main__.Calculations object at 0x000001AF22CD4990>, <__main__.Calculations object at 0x000001AF22CD4AD0>, <__main__.Calculations object at 0x000001AF22CD4C10>, <__main__.Calculations object at 0x000001AF22CD4D50>, <__main__.Calculations object at 0x000001AF22CD4E90>, <__main__.Calculations object at 0x000001AF22CD4FD0>, <__main__.Calculations object at 0x000001AF22CD5110>, <__main__.Calculations object at 0x000001AF22CD5250>, <__main__.Calculations object at 0x000001AF22CD53D0>, <__main__.Calculations object at 0x000001AF22CD5510>, <__main__.Calculations object at 0x000001AF22CD5650>, <__main__.Calculations object at 0x000001AF22CD5790>, <__main__.Calculations object at 0x000001AF22CD58D0>, <__main__.Calculations object at 0x000001AF22CD5A10>, <__main__.Calculations object at 0x000001AF22CD5B50>, <__main__.Calculations object at 0x000001AF22CD5C90>, <__main__.Calculations object at 0x000001AF22CD5DD0>, <__main__.Calculations object at 0x000001AF22CD5F10>, <__main__.Calculations object at 0x000001AF22CD6050>, <__main__.Calculations object at 0x000001AF22CD6190>, <__main__.Calculations object at 0x000001AF22CD62D0>, <__main__.Calculations object at 0x000001AF22CD6410>, <__main__.Calculations object at 0x000001AF22CD6550>, <__main__.Calculations object at 0x000001AF22CD6690>, <__main__.Calculations object at 0x000001AF22CD67D0>, <__main__.Calculations object at 0x000001AF22CD6910>, <__main__.Calculations object at 0x000001AF22CD6A50>, <__main__.Calculations object at 0x000001AF22CD6B90>, <__main__.Calculations object at 0x000001AF22CD6CD0>, <__main__.Calculations object at 0x000001AF22CD6E10>, <__main__.Calculations object at 0x000001AF22CD6F50>, <__main__.Calculations object at 0x000001AF22CD7090>]

This shows all the columns in the table as an object. My best attempt to extract the values has been:

for result in results:
    print(result.__dict__)

Output:

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A91730>, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A90E90>, 'id': 2.0} ... and so on

Not only I do not get the values but it does not print the columns, only the ID I defined in the class. I thought that when I did the automap_base it would transfer automatically. When I do define them they do appear, like this:

class Calculations(Base):
    __tablename__ = "Calculations"
    id = Column("Trade ID", Integer, primary_key=True)
    Amount = Column("Amount", Integer)
    Yield = Column("Yield", Integer)

Output:

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2092090>, 'Amount': 34303.0, 'Yield': 0.01141, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2091010>, 'Amount': 10000.0, 'Yield': 0.01214, 'id': 2.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2090FB0>, 'Amount': 43515.0, 'Yield': 0.01206, 'id': 3.0}
... and so on

What I would like to ultimately do is something like this as suggested in SQLAlchemy ORM conversion to pandas DataFrame:

df = pd.read_sql_query(sql=session.query(Calculation).all(), con=connection.engine)

But I get the following error:

 raise exc.ObjectNotExecutableError(statement) from err
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: [<__main__.CALC_TFSB_INVESTMENTS object at 0x000001FF42966E50>, ... an so on

I have also tried:

df = pd.read_sql_query(sql=select(Calculations), con=connection.engine)
print(df.head())

How can I load the DataFrame? How can I automate the schema detection, I suppose using automap_base? How can I improve my code, are there other things I can add, perhaps dunder fields to make things better?


Solution

  • The answer is

    df = pd.read_sql_query(sql=select(Calculations), con=connection.engine)
    print(df.head())
    

    This does the trick

    Corralien's answer is much more detailed.