Here is (an extremely simplified version of) my problem.
I'm using Postgresql as the backend and trying to build a sqlalchemy query from another query.
Here are the tables with some random data for the example. You can assume that each table was declared in sqlalchemy declaratively, with the name of the mappers being respectively Item and ItemVersion. At the end of the question you can find a link where I put the code for everything in this question, including the table definitions.
Some items.
item
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
A table containing versions of each item. Each has at least one.
item_version
+----+---------+---------+-----------+
| id | item_id | version | text |
+----+---------+---------+-----------+
| 1 | 1 | 0 | item_1_v0 |
| 2 | 1 | 1 | item_1_v1 |
| 3 | 2 | 0 | item_2_v0 |
| 4 | 3 | 0 | item_3_v0 |
+----+---------+---------+-----------+
Now, for a given sqlalchemy query over Item
, I want a function that returns
another query, but this time over (Item, ItemVersion)
, where the Item
s are
the same as in the original query (and in the same order!), and where the
ItemVersion
are the corresponding latest versions for each Item
.
Here is an example in SQL, which is pretty straightforward:
First a random query over the item
table
SELECT item.id as item_id
FROM item
WHERE item.id != 2
ORDER BY item.id DESC
which corresponds to
+---------+
| item_id |
+---------+
| 3 |
| 1 |
+---------+
Then from that query, if I want to join the right version
s, I can do
SELECT sq2.item_id AS item_id,
sq2.item_version_id AS item_version_id,
sq2.item_version_text AS item_version_text
FROM (
SELECT DISTINCT ON (sq.item_id)
sq.item_id AS item_id,
iv.id AS item_version_id,
iv.text AS item_version_text
FROM (
SELECT item.id AS item_id
FROM item
WHERE id != 2
ORDER BY id DESC) AS sq
JOIN item_version AS iv
ON iv.item_id = sq.item_id
ORDER BY sq.item_id, iv.version DESC) AS sq2
ORDER BY sq2.item_id DESC
Now the challenge is to write a function that does that in sqlalchemy. Here is what I have so far.
First the initial sqlalchemy query over the items:
session.query(Item).filter(Item.id != 2).order_by(desc(Item.id))
Then I'm able to build my second query but without the original ordering. In
other words I don't know how to do the second subquery wrapping that I did in
SQL to get back the ordering that was discarded by the DISTINCT ON
.
def join_version(session, query):
sq = aliased(Item, query.subquery('sq'))
sq2 = session.query(sq, ItemVersion) \
.distinct(sq.id) \
.join(ItemVersion) \
.order_by(sq.id, desc(ItemVersion.version))
return sq2
I think this SO question could be part of the answer but I'm not quite sure how.
The code to run everything in this question (database creation, population and
a failing unit test with what I have so far) can be found here. Normally
if you can fix the join_version
function, it should make the test pass!
Ok so I found a way. It's a bit of a hack but still only queries the database twice so I guess I will survive! Basically I'm querying the database for the Item
s first, and then I do another query for the ItemVersion
s, filtering on item_id
, and then reordering with a trick I found here (this is also relevant).
Here is the code:
def join_version(session, query):
items = query.all()
item_ids = [i.id for i in items]
items_v_sq = session.query(ItemVersion) \
.distinct(ItemVersion.item_id) \
.filter(ItemVersion.item_id.in_(item_ids)) \
.order_by(ItemVersion.item_id, desc(ItemVersion.version)) \
.subquery('sq')
sq = aliased(ItemVersion, items_v_sq)
items_v = session.query(sq) \
.order_by('idx(array{}, sq.item_id)'.format(item_ids))
return zip(items, items_v)