Search code examples
pythonflasksqlalchemy

SQLAlchemy, Flask and cross-contamination?


I have taken over a flask app, but it does not use the flask-sqlalchemy plugin. I am having a hard time wrapping my head around how it's set up.

It has a database.py file.

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session, Session


_session_factory = None
_scoped_session_cls = None
_db_session: Session = None


def _get_session_factory():
    global _session_factory
    if _session_factory is None:
        _session_factory = sessionmaker(
            bind=create_engine(CONNECTION_URL)
        )
    return _session_factory


def new_session():
    session_factory = _get_session_factory()
    return session_factory()


def new_scoped_session():
    global _scoped_session_cls
    if _scoped_session_cls is None:
        session_factory = _get_session_factory()
        if not session_factory:
            return
        _scoped_session_cls = scoped_session(session_factory)
    return _scoped_session_cls()


def init_session():
    global _db_session

    if _db_session is not None:
        log.warning("already init")
    else:
        _db_session = new_scoped_session()
    return _db_session


def get_session():
    return _db_session

We we start up the flask app, it calls database.init_session() and then anytime we want to use the database it calls database.get_session().

Is this a correct/safe way to interact with the database? What happens if there are two requests being processed at the same time by different threads? Will this result in cross-contamination with both using the same session


Solution

  • To explain what is happening in your code, let's first dive into how sqlalchemy connects to your database:

    1. Creating an Engine:

    At this point, you can connect with your database directly:

    engine = create_engine(...)
    
    with engine.connect() as connection:
        result = connection.execute(...)
    

    Do note, objects created/altered within the context of this connection are not guaranteed to share state outside of the connection context, until the transaction is completed / connection closed.

    To perform more complex queries (ex: mix of select and insert), you will want to use a Session.

    2. Opening a session:

    • A session is created from an engine object:
      with Session(engine) as session:
          ...
      
    • More often, a session will be created by the sessionmaker() factory method:
      Session = sessionmaker(engine)
      
    • Sessions can be used across a project, and are often how objects are mapped to a shared global state.

    3. Examining the code above:

    The oversimplified purpose of the code above is to create a session object, via get_session().

    Initially, this will return None, since this is the value of _db_session.

    Therefore, you must call init_session() which will create the various objects needed to return a session from get_session().

    Here's what happens within init_session():

    1. Calls new_scoped_session() which will return a session_factory
    2. Calls _get_session_factory() which will create the session_factory -> note how it creates an engine from the CONNECTION_URL
    3. Calls scoped_session(session_factory) which creates a scoped session and assigned to _scoped_session_cls. Note: This performs very similarly to a regular session, except is even more isolated (and safe).

    Lots of the complexity is related to the caching of state. Essentially, the code is doing the following:

    _session_factory = sessionmaker(
        bind=create_engine(CONNECTION_URL)
    )
    
    _scoped_session_cls = scoped_session(session_factory)
    
    def new_session():
        return _scoped_session_cls()
    

    I hope this was helpful. Good luck!