Search code examples
mysqlsqlgroup-byhavingany

In a grouped SQL query, efficiently discard groups that have or do not have some values in a column


I am trying to construct a query that aggregates records in a table while filtering out groups based on constraints that involve the presence or absence of certain values in one of the columns. Here is some example data:

CREATE TABLE test (
person_id smallint,
position_id smallint
);

INSERT INTO test 
VALUES (1, 30), (1, 99), (1, 98), (2, 98), (2, 99), (3, 30), (3, 28);

SELECT * FROM test;
+-----------+-------------+
| person_id | position_id |
+-----------+-------------+
|         1 |          30 |
|         1 |          99 |
|         1 |          98 |
|         2 |          98 |
|         2 |          99 |
|         3 |          30 |
|         3 |          28 |
+-----------+-------------+

I want to aggregate this over person_id, but only for persons who have position 30, and do not have position 28 (for example). The correct query result should be:

+-----------+------------+
| person_id | positions  |
+-----------+------------+
|         1 | 30, 99, 98 |
+-----------+------------+

The question is, how to do this efficiently? The actual table I will be doing this on is larger.

I have two working queries that get the correct result:

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id 
HAVING Sum(CASE WHEN position_id = 30 THEN 1 ELSE 0 END) > 0
AND Sum(CASE WHEN position_id = 28 THEN 1 ELSE 0 END) = 0;

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id 
HAVING Max(position_id = 30) = 1
AND Max(position_id = 28) = 0;

However, it seems to me that it shouldn't be necessary to actually do the full aggregations (with Sum() or Max()) for each group, like these queries do, and that it would be more efficient to reformulate this using logical 'any' conditions. E.g.

  • the first time I encounter a '30' position_id, I have satisfied the first condition;
  • the first time I encounter a '28' position_id, I have failed the second condition;

there is no need to proceed through the rest of the position_id's for a group after that. However, I'm not sure how to do that, and maybe I am in any case on the wrong track here.

This is using MySQL 8.


Solution

  • You can try with EXISTS and NOT EXISTS

    SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
    from test t
    WHERE EXISTS  ( SELECT person_id 
                         FROM test t1 
                         WHERE t.person_id=t1.person_id
                         AND t1.position_id=30 
                       ) 
    AND  NOT EXISTS  (  SELECT person_id 
                         FROM test t2 
                         WHERE t.person_id=t2.person_id
                         AND t2.position_id=28  )
    GROUP BY person_id ;
    

    Result:

    person_id positions
        1     30, 99, 98
    

    Demo