Search code examples
pythonmysqltransactionspyramidwaitress

Random NoTransaction in Pyramid


I'm having trouble identifying the source of transaction.interfaces.NoTransaction errors within my Pyramid App. I don't see any patterns to when the error happens, so to me it's quite random.

This app is a (semi-) RESTful API and uses SQLAlchemy and MySQL. I'm currently running within a docker container that connects to an external (bare metal) MySQL instance on the same host OS.

Here's the stack trace for a login attempt within the App. This error happened right after another login attempt that was actually successful.

2020-06-15 03:57:18,982 DEBUG [txn.140501728405248:108][waitress-1] new transaction
2020-06-15 03:57:18,984 INFO  [sqlalchemy.engine.base.Engine:730][waitress-1] BEGIN (implicit)
2020-06-15 03:57:18,984 DEBUG [txn.140501728405248:576][waitress-1] abort
2020-06-15 03:57:18,985 ERROR [waitress:357][waitress-1] Exception while serving /auth
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/waitress/channel.py", line 350, in service
    task.service()
  File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 171, in service
    self.execute()
  File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 441, in execute
    app_iter = self.channel.server.application(environ, start_response)
  File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 270, in __call__
    response = self.execution_policy(environ, self)
  File "/usr/local/lib/python3.8/site-packages/pyramid_retry/__init__.py", line 127, in retry_policy
    response = router.invoke_request(request)
  File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 249, in invoke_request
    response = handle_request(request)
  File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 178, in tm_tween
    reraise(*exc_info)
  File "/usr/local/lib/python3.8/site-packages/pyramid_tm/compat.py", line 36, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 135, in tm_tween
    userid = request.authenticated_userid
  File "/usr/local/lib/python3.8/site-packages/pyramid/security.py", line 381, in authenticated_userid
    return policy.authenticated_userid(self)
  File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 208, in authenticated_userid
    result = self._authenticate(request)
  File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 199, in _authenticate
    session = self._get_session_from_token(token)
  File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 320, in _get_session_from_token
    session = service.get(session_id)
  File "/opt/REDACTED-api/REDACTED_api/service/__init__.py", line 122, in get
    entity = self.queryset.filter(self.Meta.model.id == entity_id).first()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3375, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3149, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3502, in _execute_and_instances
    conn = self._get_bind_args(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3517, in _get_bind_args
    return fn(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3496, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1138, in connection
    return self._connection_for_bind(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1146, in _connection_for_bind
    return self.transaction._connection_for_bind(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 458, in _connection_for_bind
    self.session.dispatch.after_begin(self.session, self, conn)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
    fn(*args, **kw)
  File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 268, in after_begin
    join_transaction(
  File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 233, in join_transaction
    DataManager(
  File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 89, in __init__
    transaction_manager.get().join(self)
  File "/usr/local/lib/python3.8/site-packages/transaction/_manager.py", line 91, in get
    raise NoTransaction()
transaction.interfaces.NoTransaction

The trace shows that the execution eventually reaches my project, but only my custom authentication policy. And it fails right where the database should be queried for the user.

What intrigues me here is the third line on the stack trace. It seems Waitress somehow aborted the transaction it created? Any clue why?


EDIT: Here's the code where that happens: policy.py:320

def _get_session_from_token(self, token) -> UserSession:
    try:
        session_id, session_secret = self.parse_token(token)
    except InvalidToken as e:
        raise SessionNotFound(e)

    service = AuthService(self.dbsession, None)

    try:
        session = service.get(session_id)  # <---- Service Class called here 
    except NoResultsFound:
        raise SessionNotFound("Invalid session found Request headers. "
                              "Session id: %s".format(session_id))

    if not service.check_session(session, session_secret):
        raise SessionNotFound("Session signature does not match")

    now = datetime.now(tz=pytz.UTC)
    if session.validity < now:
        raise SessionNotFound(
            "Current session ID {session_id} is expired".format(
                session_id=session.id
            )
        )

    return session

And here is an a view on the that service class method:

class AuthService(ModelService):
    class Meta:
        model = UserSession
        queryset = Query(UserSession)
        search_fields = []
        order_fields = [UserSession.created_at.desc()]

    # These below are from the generic ModelClass father class
    def __init__(self, dbsession: Session, user_id: str):
        self.user_id = user_id
        self.dbsession = dbsession
        self.Meta.queryset = self.Meta.queryset.with_session(dbsession)
        self.logger = logging.getLogger("REDACTED")

    @property
    def queryset(self):
        return self.Meta.queryset

    def get(self, entity_id) -> Base:
        entity = self.queryset.filter(self.Meta.model.id == entity_id).first()

        if not entity:
            raise NoResultsFound(f"Could not find requested ID {entity_id}")

As you can see, the there's already some exception treatment. I really don't see what other exception I could try to catch on AuthService.get


Solution

  • I found the solution to be much simpler than tinkering inside Pyramid or SQLAlchemy.

    Debugging my Authentication Policy closely, I found out that my it was keeping a sticky reference for the dbsession. It was stored on the first request ever who used it, and never released.

    The first request works as expected, the following one fails: My understanding is that the object is still in memory while the app is running, and after the initial transaction is closed. The second request has a new connection, and a new transaction, but the object in memory still points to the previous one, that when used ultimately causes this.

    What I don't understand is why the exception didn't happen sometimes. As I mentioned initially, it was seemingly random.

    Another thing that I struggled with was in writing a test case to expose the issue. On my tests, the issue never happens because I have (and I've never seen it done differently) a single connection and a single transaction throughout the entire testing session, as opposed of a new connection/transaction per request, so I have not found no way to actually reproduce.

    Please let me know if that makes sense, and if you can shed a light on how to expose the bug on a test case.