python mysql sql sqlalchemy greatest-n-per-group

SQLAlchemy: How to use group_by() correctly (only_full_group_by)?

I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:

rows = session.query(MyModel) \
        .order_by(MyModel.published_date.desc()) \
        .group_by(MyModel.category_id) \
        .all()

It works fine with SQLite, but for MySQL I get this error:

[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.

What's the proper solution with SQLAlchemy?

Thanks in advance

Solution

One way to form the greatest-n-per-group query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:

my_model_alias = aliased(MyModel)

rows = session.query(MyModel).\
    outerjoin(my_model_alias,
              and_(my_model_alias.category_id == MyModel.category_id,
                   my_model_alias.published_date > MyModel.published_date)).\
    filter(my_model_alias.id == None).\
    all()

This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:

sq = session.query(
        MyModel,
        func.row_number().
            over(partition_by=MyModel.category_id,
                 order_by=MyModel.published_date.desc()).label('rn')).\
    subquery()

my_model_alias = aliased(MyModel, sq)

rows = session.query(my_model_alias).\
    filter(sq.c.rn == 1).\
    all()

Of course you could use GROUP BY as well, if you then use the results in a join:

max_pub_dates = session.query(
        MyModel.category_id,
        func.max(MyModel.published_date).label('published_date')).\
    group_by(MyModel.category_id).\
    subquery()

rows = session.query(MyModel).\
    join(max_pub_dates,
         and_(max_pub_dates.category_id == MyModel.category_id,
              max_pub_dates.published_date == MyModel.published_date)).\
    all()