Search code examples
pythonmysqlsqlsqlalchemygreatest-n-per-group

SQLAlchemy: How to use group_by() correctly (only_full_group_by)?


I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:

rows = session.query(MyModel) \
        .order_by(MyModel.published_date.desc()) \
        .group_by(MyModel.category_id) \
        .all()

It works fine with SQLite, but for MySQL I get this error:

[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.

What's the proper solution with SQLAlchemy?

Thanks in advance


Solution

  • One way to form the query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:

    my_model_alias = aliased(MyModel)
    
    rows = session.query(MyModel).\
        outerjoin(my_model_alias,
                  and_(my_model_alias.category_id == MyModel.category_id,
                       my_model_alias.published_date > MyModel.published_date)).\
        filter(my_model_alias.id == None).\
        all()
    

    This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:

    sq = session.query(
            MyModel,
            func.row_number().
                over(partition_by=MyModel.category_id,
                     order_by=MyModel.published_date.desc()).label('rn')).\
        subquery()
    
    my_model_alias = aliased(MyModel, sq)
    
    rows = session.query(my_model_alias).\
        filter(sq.c.rn == 1).\
        all()
    

    Of course you could use GROUP BY as well, if you then use the results in a join:

    max_pub_dates = session.query(
            MyModel.category_id,
            func.max(MyModel.published_date).label('published_date')).\
        group_by(MyModel.category_id).\
        subquery()
    
    rows = session.query(MyModel).\
        join(max_pub_dates,
             and_(max_pub_dates.category_id == MyModel.category_id,
                  max_pub_dates.published_date == MyModel.published_date)).\
        all()