I am trying to construct a query that aggregates records in a table while filtering out groups based on constraints that involve the presence or absence of certain values in one of the columns. Here is some example data:
CREATE TABLE test (
person_id smallint,
position_id smallint
);
INSERT INTO test
VALUES (1, 30), (1, 99), (1, 98), (2, 98), (2, 99), (3, 30), (3, 28);
SELECT * FROM test;
+-----------+-------------+
| person_id | position_id |
+-----------+-------------+
| 1 | 30 |
| 1 | 99 |
| 1 | 98 |
| 2 | 98 |
| 2 | 99 |
| 3 | 30 |
| 3 | 28 |
+-----------+-------------+
I want to aggregate this over person_id, but only for persons who have position 30, and do not have position 28 (for example). The correct query result should be:
+-----------+------------+
| person_id | positions |
+-----------+------------+
| 1 | 30, 99, 98 |
+-----------+------------+
The question is, how to do this efficiently? The actual table I will be doing this on is larger.
I have two working queries that get the correct result:
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Sum(CASE WHEN position_id = 30 THEN 1 ELSE 0 END) > 0
AND Sum(CASE WHEN position_id = 28 THEN 1 ELSE 0 END) = 0;
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Max(position_id = 30) = 1
AND Max(position_id = 28) = 0;
However, it seems to me that it shouldn't be necessary to actually do the full aggregations (with Sum()
or Max()
) for each group, like these queries do, and that it would be more efficient to reformulate this using logical 'any' conditions. E.g.
there is no need to proceed through the rest of the position_id's for a group after that. However, I'm not sure how to do that, and maybe I am in any case on the wrong track here.
This is using MySQL 8.
You can try with EXISTS
and NOT EXISTS
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
from test t
WHERE EXISTS ( SELECT person_id
FROM test t1
WHERE t.person_id=t1.person_id
AND t1.position_id=30
)
AND NOT EXISTS ( SELECT person_id
FROM test t2
WHERE t.person_id=t2.person_id
AND t2.position_id=28 )
GROUP BY person_id ;
Result:
person_id positions 1 30, 99, 98