I need to join several tables, then return distinct rows by some rule based on partitions of model C.id (let's use row_number()==1
for simplicity). As it's a window function, it cannot be directly used in where
, so requires an outer query to filter. And it works, but it turns that moving models into a subquery, the Alchemy now returns raw rows, not models objects :(
GPT suggested to use with_entities
, add_entity
but they seem to add repeated from
extractions, full cartesian product breaking the logic instead of simple parsing of already available columns. How can I achieve it?
# Main query
subq = db.query(ModelA, ModelB, ModelC)
subq = subq.filter(ModelA.some_tag == ModelB.some_tag)
subq = subq.filter(ModelA.some_tag == ModelC.some_tag)
subq = subq.filter(ModelA.some_tag == "31415926535") # just to simplify testing
# Additional field by a window function
partition_col = func.row_number().over(partition_by=ModelC.id).label("partition_col_name")
subq = subq.add_column(partition_col)
# Outer query
subq = subq.subquery()
q = db.query(subq).filter(subq.c["partition_col_name"] == 1)
Trying to add extra wrapping to get the objects... But why I get wrong results?? SQL of finalq
is fully correct and returns 2 rows in the psql, but Alchemy sees only the 1st on o_0
from sqlalchemy import text, func
from psycopg2.extensions import adapt as sqlescape
from database import get_db, ModelA, ModelB, ModelC
db = next(get_db())
# Main query
subq = db.query(ModelA, ModelB, ModelC)
subq = subq.filter(ModelA.some_tag == ModelB.some_tag)
subq = subq.filter(ModelA.some_tag == ModelC.some_tag)
subq = subq.filter(ModelA.some_tag == "31415926535") # just to simplify testing
# Additional field by a window function
partition_col = func.row_number().over(partition_by=ModelC.id).label("partition_col_name")
subq = subq.add_column(partition_col)
subq = subq.subquery()
# Outer query
outerq = db.query(subq).filter(subq.c["partition_col_name"] == 1)
# outerq = outerq.limit(128)
# Final query for ORM
compiledq = outerq.statement.compile(dialect=db.bind.dialect)
params = {k: sqlescape(v) for k, v in compiledq.params.items()}
finalq = db.query(ModelA, ModelB, ModelC).from_statement(text(str(compiledq) % params))
print(len(outerq.all())) # 2
print(len(finalq.all())) # 1 ?!
This way is simpler and works fine:
finalq = db.query(ModelA, ModelB, ModelC).from_statement(outerq)
But I still wonder why the previous approach with custom query compilation doesn't work 🤔
You can do this by placing the query within a from_statement call, called from a query that queries the models:
qq = db.query(ModelA, ModelB, ModelC).from_statement(q)