How does one convert a CursorResult-object into a Pandas Dataframe?
The following code results in a CursorResult-object:
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
engine = create_engine(f"mssql+pyodbc://{db_server}/{db_name}?trusted_connection=yes&driver={db_driver}")
q1 = "SELECT * FROM my_schema.my_table"
with Session(engine) as session:
results = session.execute(q1)
session.commit()
type(results)
>sqlalchemy.engine.cursor.CursorResult
As I couldn't find a way to extract relevant information from CursorResult, it attempted the following instead:
# Extracting data as we go
with Session(engine) as session:
results = session.execute(q1)
description = results.cursor.description
rows = results.all()
session.commit()
# Extracting column names
colnames = [elem[0] for elem in description]
# Extracting types
types = [elem[1] for elem in description]
# Creating dataframe
import pandas as pd
pd.DataFrame(rows, columns=colnames)
But what about the dtypes? It doesn't work if I just put them in, though it looks like they are all python types. For my use case I MUST use Session, so I cannot use the first suggestion of doing the classic:
# I cannot use
pandas.read_sql(q1, engine)
The reason for this is that I have to do multi-batch queries within the same context, which is why I am using the Session class.
IIUC, just use pd.DataFrame
constructor. dtypes
are correctly set.
# sqlalchemy==2.0.16
# pandas==2.0.2
from sqlalchemy.sql import text
with Session(engine) as session:
results = session.execute(text(q1))
df = pd.DataFrame(results)
# session.commit() # commit is irrelevant if you don't write data
Test on my database:
>>> df.head()
Scenario Attribute Process Period Region Vintage PV
0 WithHHP16HinsHE0CCS109LHP VAR_Cap EVTRANS_H-L 2014 FR None 296.071141
1 WithHHP16HinsHE0CCS109LHP VAR_Cap EVTRANS_H-M 2014 FR None 11.770909
2 WithHHP16HinsHE0CCS109LHP VAR_Cap IMPELCHIGA 2014 FR None 11851.674497
3 WithHHP16HinsHE0CCS109LHP VAR_Cap EVTRANS_H-L 2015 FR None 296.071141
4 WithHHP16HinsHE0CCS109LHP VAR_Cap EVTRANS_H-M 2015 FR None 11.770909
>>> df.dtypes
Scenario object
Attribute object
Process object
Period int64
Region object
Vintage object
PV float64
dtype: object
Edit:
rec = results.fetchone()
>>> rec
('WithHHP16HinsHE0CCS109LHP', 'VAR_Cap', 'EVTRANS_H-L', 2014, 'FR', None, 296.071141357762)
# python int --^ python float --^
>>> type(rec)
sqlalchemy.engine.row.Row