Search code examples
pythonsqlsqlitesqlalchemysubquery

What is the correct SQLAlchemy equivalent to this SQLite statement?


I'm trying to get the latest changes of each team.

SQLite statement
Works as expected.

SELECT * FROM (
  SELECT * FROM team_history ORDER BY changed_at DESC
) sub GROUP BY team

SQLAlchemy implementation
For whatever reason I've to sort with asc() instead of desc() to get the same result, that's why I doubt my implementation is correct.

session.query(TeamHistory)\
    .select_entity_from(
        session.query(TeamHistory).order_by(asc(TeamHistory.changed_at)).subquery()
    ).group_by(TeamHistory.team)\
    .all()

Environment

Python: 3.8.0
SQLAlchemy: 1.3.23

Reproducing

Schema:

CREATE TABLE "team_history" (ID integer PRIMARY KEY, changed_at TEXT, team TEXT);

Records:

[{"ID":1,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":2,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":3,"changed_at":"2021-03-02 10:30:00","team":"B"},
 {"ID":4,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":5,"changed_at":"2021-03-02 11:30:00","team":"B"},
 {"ID":6,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":7,"changed_at":"2021-03-02 11:00:00","team":"B"},
 {"ID":8,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":9,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":10,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":11,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":12,"changed_at":"2021-03-02 10:00:00","team":"A"},
 {"ID":13,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":14,"changed_at":"2021-03-02 12:30:00","team":"A"},
 {"ID":15,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":16,"changed_at":"2021-03-02 12:00:00","team":"A"},
 {"ID":17,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":18,"changed_at":"2021-03-02 13:30:00","team":"A"},
 {"ID":19,"changed_at":"2021-03-02 10:00:00","team":"B"},
 {"ID":20,"changed_at":"2021-03-02 10:00:00","team":"A"}]

Solution
Thanks everyone!

session.query(TeamHistory)\
    .group_by(TeamHistory.team)\
    .having(func.max(TeamHistory.changed_at))\
    .all()

Solution

  • When you use:

    SELECT *
    FROM tablename
    GROUP BY somecolumn
    

    SQLite returns 1 row for each distinct value of somecolumn, but which row?
    The documentation states that the row is undefined, meaning it is chosen arbitrarily, although from my experience it seems that the 1st row in the resultset belonging to each group will be returned.
    But this is not guaranteed and queries like the one above and your query should be avoided.

    There are a number of ways to get for each team the row with the latest changed_at.
    One of them, which is valid in SQLite (although it would not work in other databases) is:

    SELECT * FROM team_history GROUP BY team HAVING MAX(changed_at)
    

    See the demo.
    So, this is the query that you should translate to SQLAlchemy (I can't help you with that).

    There are other ways also, using window functions, or EXISTS.