Search code examples
pythonsqlalchemyormwindow-functions

SQLAlchemy: return ORM objects from subquery


I need to join several tables, then return distinct rows by some rule based on partitions of model C.id (let's use row_number()==1 for simplicity). As it's a window function, it cannot be directly used in where, so requires an outer query to filter. And it works, but it turns that moving models into a subquery, the Alchemy now returns raw rows, not models objects :( GPT suggested to use with_entities, add_entity but they seem to add repeated from extractions, full cartesian product breaking the logic instead of simple parsing of already available columns. How can I achieve it?

# Main query
subq = db.query(ModelA, ModelB, ModelC)
subq = subq.filter(ModelA.some_tag == ModelB.some_tag)
subq = subq.filter(ModelA.some_tag == ModelC.some_tag)
subq = subq.filter(ModelA.some_tag == "31415926535")  # just to simplify testing

# Additional field by a window function
partition_col = func.row_number().over(partition_by=ModelC.id).label("partition_col_name")
subq = subq.add_column(partition_col)

# Outer query
subq = subq.subquery()
q = db.query(subq).filter(subq.c["partition_col_name"] == 1)

Trying to add extra wrapping to get the objects... But why I get wrong results?? SQL of finalq is fully correct and returns 2 rows in the psql, but Alchemy sees only the 1st on o_0

from sqlalchemy import text, func
from psycopg2.extensions import adapt as sqlescape
from database import get_db, ModelA, ModelB, ModelC
db = next(get_db())

# Main query
subq = db.query(ModelA, ModelB, ModelC)
subq = subq.filter(ModelA.some_tag == ModelB.some_tag)
subq = subq.filter(ModelA.some_tag == ModelC.some_tag)
subq = subq.filter(ModelA.some_tag == "31415926535")  # just to simplify testing

# Additional field by a window function
partition_col = func.row_number().over(partition_by=ModelC.id).label("partition_col_name")
subq = subq.add_column(partition_col)
subq = subq.subquery()

# Outer query
outerq = db.query(subq).filter(subq.c["partition_col_name"] == 1)
# outerq = outerq.limit(128)

# Final query for ORM
compiledq = outerq.statement.compile(dialect=db.bind.dialect)
params = {k: sqlescape(v) for k, v in compiledq.params.items()}
finalq = db.query(ModelA, ModelB, ModelC).from_statement(text(str(compiledq) % params))

print(len(outerq.all()))  # 2
print(len(finalq.all()))  # 1 ?!

Update

This way is simpler and works fine:

finalq = db.query(ModelA, ModelB, ModelC).from_statement(outerq)

But I still wonder why the previous approach with custom query compilation doesn't work 🤔


Solution

  • You can do this by placing the query within a from_statement call, called from a query that queries the models:

    qq = db.query(ModelA, ModelB, ModelC).from_statement(q)