Search code examples
pythonsql-serverpandaspyodbc

Do I need to close pyodbc sql server connection when reading the data into the Pandas Dataframe?


I am confused how to use context manager with pyodbc connection. As far as I know, it is usually necessary to close the database connection and using context manager is a good practice for that (for pyodbc, I saw some examples which closes the cursor only). Long story short, I am creating a python app which pulls data from sql server and want to read them into a Pandas Dataframe.

I did some search on using contextlib and wrote an script sql_server_connection:

import pyodbc
import contextlib

@contextlib.contextmanager
def open_db_connection(server, database):
    """
    Context manager to automatically close DB connection. 
    """
    conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';Trusted_Connection=yes;')
    try:
        yield
    except pyodbc.Error as e:
        print(e)
    finally:
        conn.close()

I then called this in another script:

from sql_server_connection import open_db_connection
with open_db_connection(server, database) as conn:
    df = pd.read_sql_query(query_string, conn)

which raises this error:

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\sql.py", line 436, in read_sql_query
    return pandas_sql.read_query(
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\sql.py", line 2116, in read_query
    cursor = self.execute(*args)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\sql.py", line 2054, in execute
    cur = self.con.cursor()
AttributeError: 'NoneType' object has no attribute 'cursor'

I didn't define a cursor here because I expect that Pandas handle it as it did before I think about closing the connection. If the approach above is wrong how would I close the connection? Or does pyodbc handle it? Thanks!


Solution

  • You yield nothing (None) from your open_db_connection.

    import pyodbc
    import contextlib
    
    @contextlib.contextmanager
    def open_db_connection(server, database):
        """
        Context manager to automatically close DB connection. 
        """
        conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';Trusted_Connection=yes;')
        try:
            yield conn  # yield the connection not None
        except pyodbc.Error as e:
            print(e)
        finally:
            conn.close()
    

    Also, I should point out two things:

    1. pyodbc does not expect the user to close the connections (docs):

    Connections are automatically closed when they are deleted (typically when they go out of scope) so you should not normally need to call this, but you can explicitly close the connection if you wish.

    1. pandas expects a SQLAlchemy connectable, a SQLAlchemy URL str or a sqlite3 connections in its read_sql_* functions (docs), not a pyODBC connection, so your mileage may vary.

    pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)

    con: SQLAlchemy connectable, str, or sqlite3 connection

    Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.


    I would probably simplify to the following:

    No need for error handling, this function must return a connection (also split up the string on several line and used f-strings rather than concat).

    import pyodbc
    
    def open_db_connection(server, database):
        """
        Context manager to automatically close DB connection. 
        """
        return pyodbc.connect(
            "DRIVER={SQL Server};"
            f"SERVER={server};"
            f"DATABASE={database};"
            "Trusted_Connection=yes;"
        )
    

    Call the above function directly in the argument list to scope the connection to inside read_sql_query. Might want to do error handling here as well, but that depends on what you're writing.

    import pandas as pd
    
    from sql_server_connection import open_db_connection
    
    df = pd.read_sql_query(
        query_string,
        open_db_connection(server, database),
    )