Search code examples
selectdatabase-designsqlalchemyaggregate-functions

Is there something like func.group_concat() in SQLALchemy Core that will return a list of objects, rather than a string?


Is there a convention/is it possible to return an array/list of nested objects within a column, e.g. from joined many-to-many tables?

For example, say I have three normalised tables: "movies", "actors", and a junction table "actors_in_movies". I want to retrieve multiple movies, one movie per row, with a list of actors for that movie within its row.

A basic SELECT statement on the joined tables would retrieve:

"Bill & Ted's Excellent Adventure", "Keanu Reeves"
"Bill & Ted's Excellent Adventure", "Alex Winter"
"Bill & Ted's Excellent Adventure", "George Carlin"
"The Matrix", "Keanu Reeves"
"The Matrix", "Carrie-Anne Moss"
"The Matrix", "Lawrence Fishbourne"

Using func.group_concat() I could retrieve:
"Bill & Ted's Excellent Adventure", "Keanu Reeves, Alex Winter, George Carlin"
"The Matrix", "Keanu Reeves, Carrie-Anne Moss, Lawrence Fishbourne"

And with GROUP_BY and an aggregate function like COUNT I could retrieve:
"Bill & Ted's Excellent Adventure", 3
"The Matrix", 3

But does SQL (and SQL Alchemy Core by extension) support returning something where I can access nested aggregated results, like:
"Bill & Ted's Excellent Adventure", ["Keanu Reeves", "Alex Winter", "George Carlin"]
"The Matrix", ["Keanu Reeves", "Carrie-Anne Moss", "Lawrence Fishbourne"]

I understand this is a primary use case for SQLAlchemy ORM - is there a conventional way to manage this using pure SQL/SQL Alchemy Core? Does it require multiple queries? Or is it usually left to the application layer to manage?


Solution

  • is there a conventional way to manage this using pure SQL/SQL Alchemy Core?

    In short: No, there is no such database-independent mechanism in Core. That's what ORM is for.

    However, Core can take advantage of a such a feature in a specific database. For example, PostgreSQL has array_agg():

    from sqlalchemy import Column, ForeignKey, Integer, MetaData, String, Table, create_engine, func, insert, select
    
    engine = create_engine("postgresql://scott:tiger@192.168.0.199/test")
    
    meta = MetaData()
    
    movie = Table(
        "movie",
        meta,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("title", String, nullable=False),
    )
    
    actor = Table(
        "actor",
        meta,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("name", String, nullable=False),
    )
    
    actor_in_movie = Table(
        "actor_in_movie",
        meta,
        Column("actor_id", Integer, ForeignKey("actor.id"), primary_key=True),
        Column("movie_id", Integer, ForeignKey("movie.id"), primary_key=True),
    )
    
    meta.drop_all(engine)
    meta.create_all(engine)
    
    # populate example data
    with engine.begin() as conn:
        conn.execute(
            insert(movie),
            [
                dict(id=1, title="Bill & Ted's Excellent Adventure"),
            ],
        )
        conn.execute(
            insert(actor),
            [
                dict(id=1, name="Reeves, Keanu"),
                dict(id=2, name="Carlin, George"),
            ],
        )
        conn.execute(
            insert(actor_in_movie),
            [
                dict(actor_id=1, movie_id=1),
                dict(actor_id=2, movie_id=1),
            ],
        )
    
    with engine.begin() as conn:
        qry = (
            select(movie.c.title, func.array_agg(actor.c.name).label("actors"))
            .select_from(actor)
            .join(actor_in_movie)
            .join(movie)
            .group_by(movie.c.title)
        )
        result = conn.execute(qry)
        for row in result:
            print(row)
            # ("Bill & Ted's Excellent Adventure", ['Reeves, Keanu', 'Carlin, George'])
            print(type(row.actors))  # <class 'list'>
            print(type(row.actors[0]))  # <class 'str'>