All the research I've done on this issue says a previously sorted result-set is the way to control the GROUP BY behavior, but it simply doesn't work here.
I'm writing code to clean up duplicate values, and I need to save the highest item_state_c ID for each campaign_main_ref_id only when there's duplicates. As you can see only campaigns 92 and 94 have multiple items. The issue is no matter what I do, I can never get a GROUP BY result to return the ID based on the highest (DESC) sort order of item_state_c. For clarity, here's the source table:
The problem is it will never return ID 45893 for campaign 92 based on the item_state_c = 3. As you can see I order by item_state_c DESC in my CTE reference - but the GROUP BY is ignoring that.
So how can I get the GROUP BY use in this scenario to select 45893 for campaign 92 due to the item state?
Here's the code that generates the table (which is campaign_items) and attempts to group and leverage the previous sort (which doesn't work - not sure why):
WITH
frontload_array AS
(
SELECT jt.ID
FROM JSON_TABLE('[182723,182722,182721,178216,2]', '$[*]' COLUMNS(ID int PATH '$')) AS jt
),
campaign_items AS
(
SELECT ci.ID, ci.campaign_main_ref_id, ci.item_state_c
FROM campaign_item ci
WHERE ci.item_object_type_sc = 4
AND ci.item_object_subtype_sc = 1
AND ci.item_object_ref_id IN (SELECT ID FROM frontload_array)
ORDER BY ci.item_state_c DESC
),
campaign_duplicates AS
(
SELECT ci.ID, ci.item_state_c, ci.campaign_main_ref_id
FROM campaign_items ci
GROUP BY ci.campaign_main_ref_id
HAVING COUNT(ci.campaign_main_ref_id) > 1
)
SELECT * FROM campaign_duplicates;
Here's the result of campaign_duplicates:
The problem is it's not leveraging the previous ORDER BY in this GROUP, which is what unanimously appears to be the "solution" to my problem - but it simply doesn't work (as you can see).
What I need is a single SQL call that returns the IDs with the highest item_state_c for each campaign that has multiple items. I don't know how to do this without GROUP BY, and based on my understanding, ORDER BY only sorts data after GROUP BY, so I'm at a loss; especially since setting the order in the reference table I GROUP on doesn't seem to impact anything. I simply don't understand how GROUP BY determines the ID it's choosing as it's a middle ID number.
Any help would be appreciated.
Doing ORDER BY in subqueries may be ignored.
https://dev.mysql.com/doc/refman/8.0/en/derived-table-optimization.html
The optimizer propagates an ORDER BY clause in a derived table or view reference to the outer query block if these conditions are all true:
The outer query is not grouped or aggregated.
The outer query does not specify DISTINCT, HAVING, or ORDER BY.
The outer query has this derived table or view reference as the only source in the FROM clause.
Otherwise, the optimizer ignores the ORDER BY clause.
In your case, you are aggregating, so the ORDER BY will be eliminated.
Since you are apparently using MySQL 8.0, you should use a window function:
WITH
frontload_array AS
(
SELECT jt.ID
FROM JSON_TABLE('[182723,182722,182721,178216,2]', '$[*]' COLUMNS(ID int PATH '$')) AS jt
),
campaign_items AS
(
SELECT ci.ID, ci.campaign_main_ref_id, ci.item_state_c,
ROW_NUMBER() OVER (PARTITION BY ci.campaign_main_ref_id ORDER BY ci.item_state_c DESC) AS rownum
FROM campaign_item ci
WHERE ci.item_object_type_sc = 4
AND ci.item_object_subtype_sc = 1
AND ci.item_object_ref_id IN (SELECT ID FROM frontload_array)
),
SELECT ci.ID, ci.item_state_c, ci.campaign_main_ref_id
FROM campaign_items ci
WHERE ci.rownum = 1;