Search code examples
pythonpandassqlalchemy

How to read CursorResult to Pandas in SQLAlchemy?


How does one convert a CursorResult-object into a Pandas Dataframe?

The following code results in a CursorResult-object:

from sqlalchemy.orm import Session
from sqlalchemy import create_engine

engine = create_engine(f"mssql+pyodbc://{db_server}/{db_name}?trusted_connection=yes&driver={db_driver}")
q1 = "SELECT * FROM my_schema.my_table"

with Session(engine) as session:
    results = session.execute(q1)
    session.commit()

type(results)
>sqlalchemy.engine.cursor.CursorResult

As I couldn't find a way to extract relevant information from CursorResult, it attempted the following instead:

# Extracting data as we go
with Session(engine) as session:
    results = session.execute(q1)
    description = results.cursor.description
    rows = results.all()
    session.commit()

# Extracting column names
colnames = [elem[0] for elem in description]

# Extracting types
types = [elem[1] for elem in description]

# Creating dataframe
import pandas as pd
pd.DataFrame(rows, columns=colnames)

But what about the dtypes? It doesn't work if I just put them in, though it looks like they are all python types. For my use case I MUST use Session, so I cannot use the first suggestion of doing the classic:

# I cannot use
pandas.read_sql(q1, engine)

The reason for this is that I have to do multi-batch queries within the same context, which is why I am using the Session class.


Solution

  • IIUC, just use pd.DataFrame constructor. dtypes are correctly set.

    # sqlalchemy==2.0.16
    # pandas==2.0.2
    from sqlalchemy.sql import text
    
    with Session(engine) as session:
        results = session.execute(text(q1))
        df = pd.DataFrame(results)
        # session.commit()  # commit is irrelevant if you don't write data
    

    Test on my database:

    >>> df.head()
                        Scenario Attribute      Process  Period Region Vintage            PV
    0  WithHHP16HinsHE0CCS109LHP   VAR_Cap  EVTRANS_H-L    2014     FR    None    296.071141
    1  WithHHP16HinsHE0CCS109LHP   VAR_Cap  EVTRANS_H-M    2014     FR    None     11.770909
    2  WithHHP16HinsHE0CCS109LHP   VAR_Cap   IMPELCHIGA    2014     FR    None  11851.674497
    3  WithHHP16HinsHE0CCS109LHP   VAR_Cap  EVTRANS_H-L    2015     FR    None    296.071141
    4  WithHHP16HinsHE0CCS109LHP   VAR_Cap  EVTRANS_H-M    2015     FR    None     11.770909
    
    >>> df.dtypes
    Scenario      object
    Attribute     object
    Process       object
    Period         int64
    Region        object
    Vintage       object
    PV           float64
    dtype: object
    

    Edit:

    rec = results.fetchone()
    
    >>> rec
    ('WithHHP16HinsHE0CCS109LHP', 'VAR_Cap', 'EVTRANS_H-L', 2014, 'FR', None, 296.071141357762)
    #                                           python int  --^       python float --^
    
    >>> type(rec)
    sqlalchemy.engine.row.Row