Search code examples
pythonsqlalchemycomposite-primary-key

SQLAlchemy: select most recent row for all ids in a single table with composite primary key


I want to do this but in SQLAlchemy. The only difference is that rather than only being able to get the most recent record, I want to be able to get the most recent record before a given timestamp. As long as I ensure rows are never deleted, this allows me to view the database as it was on a particular timestamp.

Let's say my model looks like this:

from datetime import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative include declarative_base
Base = declarative_base()
class User(Base):
    __tablename__ = "users"
    id_ = Column("id", Integer, primary_key=True, index=True, nullable=False)
    timestamp = Column(DateTime, primary_key=True, index=True, nullable=False, default=datetime.utcnow())
    # other non-primary attributes would go here

And I have this users table (timestamps simplified):

| id_ | timestamp |
-------------------
  0     1
  0     4
  0     6
  1     3
  2     7
  2     3

For example, if I request a snapshot at timestamp = 4, I want to get:

| id_ | timestamp |
-------------------
  0     4
  1     3
  2     3

The best I can come up with is doing it procedurally:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
db_engine = create_engine(...)
SessionLocal = sessionmaker(bind=db_engine, ...)
db_session = SessionLocal()

def get_snapshot(timestamp: datetime):
    all_versions = db_session.query(User).filter(User.timestamp <= timestamp).order_by(desc(User.timestamp))
    snapshot = []
    for v in all_versions:
        if v.id_ not in (i.id_ for i in snapshots):
            snapshot.append(v)
    return snapshot

However, this gives me a list of model objects rather than a sqlalchemy.orm.query.Query, so I have to treat the result differently to standard queries in other parts of my code. Can this be done all in the ORM?

Thanks in advance


Solution

  • Have you tried:

    all_versions = db_session.query(User, func.max(User.timestamp)).\
                   filter(User.timestamp <= timestamp).\
                   group_by(User.id_)               
    

    You can read more about generic functions in SQLAlchemy here