Search code examples
pythonsqlpeewee

Multiple aggregate functions with peewee


I barely know SQL but I have been using peewee and so far it has worked really well. I fell upon a problem I'm not sure how to solve. Here are the settings.

I have two tables A and B. A, among other fields, has multiple B with a foreign key. B has 3 fields (5 in my case, but 3 is the minimum to reproduce):

A:
id param_1 ...
0  ...
1  ...
2  ...

and

B:
id A_key x  y  z
0  0     3. 5. 1
1  0     4. 4. 2
2  1     2. 3. 3
3  1     5. 1. 4    

I want to get the z which corresponds to the minimum across x on one hand, y on the other hand, for each A. Doing so for x or y is easy: (Q1)

A.select(fn.MIN(B.x), B.z.alias('z_x')).join(B).group_by(A)
> A_id x  z_x
> 0    3. 1
> 1    2. 3

and I can do likewise for B. My question is: what would you do to have both? (Q2)

???
> A_id x  z_x  y  z_y 
> 0    3. 1    4. 2
> 1    2. 3    1. 4

Of course, I tried doing

A.select(fn.MIN(B.x), B.z.alias('z_x'), fn.MIN(B.y), B.z.alias('z_y')).join(B).group_by(A)

but the column z_x is just replaced by z_y.

For the SQL specialists out there, (Q1) corresponds to

SELECT MIN("t1"."x"), "t1"."z" FROM "A" AS "t2" INNER JOIN "B" AS "t1" ON ("t1"."A_id" = "t2"."id") GROUP BY "t2"."id"

I can also translate a SQL query for (Q2) in peewee if someone sees the solution in pure SQL!

Happy for any help, and I can provide more details if needed!


Solution

  • This ended up being harder than I thought initially because Sqlite is kinda making the simplified versions of the query work... However they're not quite correct. At any rate, the following seems working correctly, at least as far as I understand the question (get the Z's for the min X and min Y, for each A):

    class A(Base):
        key = TextField()
    
    class B(Base):
        a = ForeignKeyField(A)
        x = IntegerField()
        y = IntegerField()
        z = IntegerField()
    
    db.create_tables([A, B])
    
    a1 = A.create(key='a1')
    a2 = A.create(key='a2')
    
    B.create(a=a1, x=1, y=100, z=10)
    B.create(a=a1, x=10, y=10, z=5)
    B.create(a=a1, x=100, y=1, z=2)
    
    B.create(a=a2, x=2, y=200, z=20)
    B.create(a=a2, x=20, y=20, z=15)
    B.create(a=a2, x=200, y=4, z=10)
    
    # The idea is we will create first a query that gets us
    # the minimum X for each A (subq).
    BX = B.alias()
    subq = BX.select(BX.a, fn.MIN(BX.x).alias('minx')).group_by(BX.a)
    
    # Then we want to get the Z's for all rows whose X is
    # equal to the minimum X (for each A).
    z_for_x = (BX
               .select(BX.a, BX.z, subq.c.minx)
               .join(subq, on=((BX.a == subq.c.a_id) & (BX.x == subq.c.minx)))
               .group_by(BX.a, BX.z))
    
    # Do the same for the Y's.
    # First find the minimum Y for each A.
    BY = B.alias()
    subq = BY.select(BY.a, fn.MIN(BY.y).alias('miny')).group_by(BY.a)
    
    # Then get all Z's for each A whose Y is equal to the
    # minimum Y we found.
    z_for_y = (BY
               .select(BY.a, BY.z, subq.c.miny)
               .join(subq, on=((BY.a == subq.c.a_id) & (BY.y == subq.c.miny)))
               .group_by(BY.a, BY.z))
    
    # We'll use common table expressions to access these
    # two distinct subqueries.
    xcte = z_for_x.cte('zforx', columns=['ba', 'bz', 'minx'])
    ycte = z_for_y.cte('zfory', columns=['ba', 'bz', 'miny'])
    
    # Then get all A's and their associated minimum X and it's
    # corresponding Z, and minimum Y and it's corresponding Z:
    q = (A.select(A, xcte.c.minx, xcte.c.bz, ycte.c.miny, ycte.c.bz)
         .join_from(A, xcte, JOIN.INNER, on=(A.id == xcte.c.ba))
         .join_from(A, ycte, JOIN.INNER, on=(A.id == ycte.c.ba))
         .with_cte(xcte, ycte))
    
    # Execute our query:
    for row in q.tuples():
        print(row)
    
    
    #OUTPUT:
    # 1, 'a1', 1, 10, 1, 2
    # 2, 'a2', 2, 20, 4, 10
    

    Explanation of output:

    For a1, the minimum X=1 (with z=10), and minimum Y=1 (with z=2)

    For a2, the minimum X=2 (with z=20), and minimum Y=4 (with z=10)