I have two relevant tables in MySQL that look something like this:
marker
table
id | SEASONCD | ITEMCD | PRICETYPECD |
---|---|---|---|
1 | foo | bar | baz |
2 | foo | bar | baz |
3 | foo | bar | baz |
4 | qux | bar | baz |
5 | qux | bar | baz |
6 | spam | eggs | ham |
seat_marker
table
id | MARKER_ID |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 6 |
I would like to select all of the rows in the marker
table that are duplicates, excluding the one that is most frequently referenced / joined by the seat_marker
table. In other words, I would like to list all duplicates excluding the "original", where "original" is defined as the row with the most occurrences in seat_marker.MARKER_ID
.
The result would look like this:
id | SEASONCD | ITEMCD | PRICETYPECD |
---|---|---|---|
2 | foo | bar | baz |
3 | foo | bar | baz |
5 | qux | bar | baz |
We can see this is the original data from the marker
table, but with three rows removed;
seat_marker
than the other foo-bar-baz rows.seat_marker
, I don't have a preference if we remove id 4 or id 5. I chose id 4 at random.The closest I've been able to get is this, which lists all values that are a duplicate and the number of assignments it has. But does not exclude the "originals".
-- Select all duplicates, and list the number of times it's referenced in the seat_marker table
SELECT
n1.id, n1.SEASONCD, n1.ITEMCD, n1.PRICETYPECD, (
SELECT COUNT(1)
FROM seat_marker
WHERE MARKER_ID = n1.id
) active_assignments
FROM marker n1, marker n2
WHERE
n1.id <> n2.id AND
n1.SEASONCD = n2.SEASONCD AND
n1.ITEMCD = n2.ITEMCD AND
n1.PRICETYPECD = n2.PRICETYPECD
GROUP BY id, SEASONCD, ITEMCD, PRICETYPECD, active_assignments
ORDER BY SEASONCD, ITEMCD, PRICETYPECD, active_assignments DESC, id;
The expected result included in your question appears to be incorrect. It should be displaying 4
instead of 5
, as 5
has one corresponding row in seat_marker
and is therefor the "row with the most occurrences".
One way to do this is with COUNT and ROW_NUMBER()
:
WITH cte AS (
SELECT m.*,
ROW_NUMBER() OVER (PARTITION BY SEASONCD, ITEMCD, PRICETYPECD ORDER BY COUNT(sm.id) DESC, m.id) AS rn
FROM marker m
LEFT JOIN seat_marker sm ON m.id = sm.MARKER_ID
GROUP BY m.id
)
SELECT * FROM cte WHERE rn > 1;
Output:
id | SEASONCD | ITEMCD | PRICETYPECD | rn |
---|---|---|---|---|
2 | foo | bar | baz | 2 |
3 | foo | bar | baz | 3 |
4 | qux | bar | baz | 2 |
Here's a db<>fiddle.