I want to select only those identifiers from a table which are contained in another subquery result and only top-2 rows must be used.
For example you can run the query on PG:
select a.rid
from(
values (41), (42), (43), (44)
) a(rid)
where a.rid in
(
select b.x
from(
values (43), (44), (42), (41)
) b(x)
order by b.x
fetch first 2 rows only
);
It should return
41
42
It works. However if I use the following query with CTE's
select rid
from(
values (41), (42), (43), (44)
) s(rid)
where rid in
(
with recursive messages as (
select
rid as "root_id",
parent.id,
parent.sent_date
from
purchase_message parent
where
parent.id = rid
union
select
rid as "root_id",
child.id,
child.sent_date
from
purchase_message child
join messages on
child.parent_message_id = messages.id
),
ordered_root_ids as (
select
m."root_id",
max(m."sent_date") over (partition by m."root_id") as "maxSentDate"
from
messages m
order by "maxSentDate" desc
)
select "root_id" from ordered_root_ids
fetch first 2 rows only
)
it doesn't work. I get all the results
41, 42, 43, 44
Why doesn't it return only 2 results? According to my expectations it must be only 2 results, and it must be 43 and 44 because they are the latest messages.
I don't know how to provide all the related tables because there are too many. I hope they aren't necessary. Does CTE or recursive CTE or nested query affect this? Or combination of all of them? How to modify the above query so that it sorts data in the subquery, takes 2 top rows and then they are used in the in
clause of the outer query?
Does CTE or recursive CTE or nested query affect this? Or combination of all of them?
It doesn't. There's no way for limit 2
and fetch first 2 rows only
to return more than just the two rows. Still, you're getting 2 rows for each value that you're checking, so it's more than enough. This:
select rid
from(
values (41), (42), (43), (44)
) s(rid)
where rid in
(--parenthesized subexpression
Will run the parenthesized subexpression for each rid
independently, like a lateral
subquery, because you referenced the outer rid
inside there. As a result, for each value you check, you're getting up to two matches. I'm guessing you expected to only get the 2 matches once, and that the in()
would compare each value to those two.
You can move your inputs to a CTE, then reference all its rid
s at once from your subquery, have them all descend down the hierarchy, get sorted together then filtered down together to the top 2. Demo at db<>fiddle:
with inputs(rid) as (values (41), (42), (43), (44) )
select rid
from inputs join
( with recursive messages as (
select parent.id as "root_id",
parent.id,
parent.sent_date
from purchase_message parent
where parent.id in (select rid from inputs)
union
select parent."root_id",
child.id,
child.sent_date
from purchase_message child
join messages parent
on child.parent_message_id = parent.id
)
select m."root_id" as rid
from messages m
group by m."root_id"
order by max(m."sent_date") desc
fetch first 2 rows only
)q using(rid);
rid |
---|
43 |
44 |
I also had to change your logic that selects the top two root_id
based on the most recent sent_date
in their hierarchy. Once it's allowed to operate on all input rid
's, your code would keep picking 2 of the same rid
:
ordered_root_ids as (
select
m."root_id",
max(m."sent_date") over (partition by m."root_id") as "maxSentDate"
from
messages m
order by "maxSentDate" desc
)
select "root_id" from ordered_root_ids
fetch first 2 rows only
because of your use of the window function, you'd get all rows of each hierarchy, each repeating their max(sent_date)
. As long as the hierarchy with the most recent sent_date
has 2 or more rows in it, you will get those two, because they both have the same max(sent_date)
and the same rid
.