I'm trying to get the latest changes of each team.
SQLite statement
Works as expected.
SELECT * FROM (
SELECT * FROM team_history ORDER BY changed_at DESC
) sub GROUP BY team
SQLAlchemy implementation
For whatever reason I've to sort with asc()
instead of desc()
to get the same result, that's why I doubt my implementation is correct.
session.query(TeamHistory)\
.select_entity_from(
session.query(TeamHistory).order_by(asc(TeamHistory.changed_at)).subquery()
).group_by(TeamHistory.team)\
.all()
Environment
Python: 3.8.0
SQLAlchemy: 1.3.23
Reproducing
Schema:
CREATE TABLE "team_history" (ID integer PRIMARY KEY, changed_at TEXT, team TEXT);
Records:
[{"ID":1,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":2,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":3,"changed_at":"2021-03-02 10:30:00","team":"B"},
{"ID":4,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":5,"changed_at":"2021-03-02 11:30:00","team":"B"},
{"ID":6,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":7,"changed_at":"2021-03-02 11:00:00","team":"B"},
{"ID":8,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":9,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":10,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":11,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":12,"changed_at":"2021-03-02 10:00:00","team":"A"},
{"ID":13,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":14,"changed_at":"2021-03-02 12:30:00","team":"A"},
{"ID":15,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":16,"changed_at":"2021-03-02 12:00:00","team":"A"},
{"ID":17,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":18,"changed_at":"2021-03-02 13:30:00","team":"A"},
{"ID":19,"changed_at":"2021-03-02 10:00:00","team":"B"},
{"ID":20,"changed_at":"2021-03-02 10:00:00","team":"A"}]
Solution
Thanks everyone!
session.query(TeamHistory)\
.group_by(TeamHistory.team)\
.having(func.max(TeamHistory.changed_at))\
.all()
When you use:
SELECT *
FROM tablename
GROUP BY somecolumn
SQLite returns 1 row for each distinct value of somecolumn
, but which row?
The documentation states that the row is undefined, meaning it is chosen arbitrarily, although from my experience it seems that the 1st row in the resultset belonging to each group will be returned.
But this is not guaranteed and queries like the one above and your query should be avoided.
There are a number of ways to get for each team
the row with the latest changed_at
.
One of them, which is valid in SQLite (although it would not work in other databases) is:
SELECT * FROM team_history GROUP BY team HAVING MAX(changed_at)
See the demo.
So, this is the query that you should translate to SQLAlchemy (I can't help you with that).
There are other ways also, using window functions, or EXISTS
.