Search code examples
mysqlgroup-bydistinct

MySQL group by null and not null values


I have a table like this:

id | cluster_id | user_id | name      | ...
1  | 1          | 1       | test name
2  | 1          | 3       | other
3  | null       | 1       | one more
4  | 2          | 1       | foo
5  | null       | 1       | bar
6  | 1          | 1       | baz

I'd like to create a query that groups by the cluster_id column but only group by the columns that have a not null value so that I get something like this:

id | cluster_id | user_id | ...
1  | 1          | 1       | test name
3  | null       | 1       | one more
4  | 2          | 1       | foo
5  | null       | 1       | bar

I'd like to a list with distinct cluster_ids, but only where the cluster_id is not null. Also I'd like to filter on arbitrary columns like user_id.

In the above result I also query on the user_id, where the user_id is 1.

How do I create such a query?

Thanks in advance!


Solution

  • The query is simple.

    The GROUP BY also works with NULL Values

    I made two queries, the first includes the user_id the last not

    You must test this with id as primary key, and see if excluding the NULL brings some performance

    CREATE TABLE tab1 (
      `id` INTEGER,
      `cluster_id` int,
      `user_id` INTEGER,
      `name` VARCHAR(20)
    );
    
    INSERT INTO tab1
      (`id`, `cluster_id`, `user_id`, `name`)
    VALUES
      ('1', '1', '1', 'test name'),
      ('2', '1', '3', 'other'),
      ('3', null, '1', 'one more'),
      ('4', '2', '1', 'foo'),
      ('5', null, '1', 'bar'),
      ('6', '1', '1', 'baz');
    
    SELECT * FROM  tab1 WHERE `id` IN (SELECT MIN(`id`) FROM tab1 GROUP BY `cluster_id`,`user_id`)
    UNION 
    SELECT * FROM tab1 WHERE `cluster_id` IS NULL
    
    id | cluster_id | user_id | name     
    -: | ---------: | ------: | :--------
     1 |          1 |       1 | test name
     2 |          1 |       3 | other    
     3 |       null |       1 | one more 
     4 |          2 |       1 | foo      
     5 |       null |       1 | bar      
    
    SELECT * FROM  tab1 WHERE `id` IN (SELECT MIN(`id`) FROM tab1 WHERE `cluster_id` IS NOT NULL GROUP BY `cluster_id`,`user_id`)
    UNION 
    SELECT * FROM tab1 WHERE `cluster_id` IS NULL
    
    id | cluster_id | user_id | name     
    -: | ---------: | ------: | :--------
     1 |          1 |       1 | test name
     2 |          1 |       3 | other    
     4 |          2 |       1 | foo      
     3 |       null |       1 | one more 
     5 |       null |       1 | bar      
    
    SELECT * FROM  tab1 WHERE `id` IN (SELECT MIN(`id`) FROM tab1 GROUP BY `cluster_id`)
    UNION 
    SELECT * FROM tab1 WHERE `cluster_id` IS NULL
    
    id | cluster_id | user_id | name     
    -: | ---------: | ------: | :--------
     1 |          1 |       1 | test name
     3 |       null |       1 | one more 
     4 |          2 |       1 | foo      
     5 |       null |       1 | bar      
    
    SELECT * FROM  tab1 WHERE `id` IN (SELECT MIN(`id`) FROM tab1 WHERE `cluster_id` IS NOT NULL GROUP BY `cluster_id`)
    UNION 
    SELECT * FROM tab1 WHERE `cluster_id` IS NULL
    
    id | cluster_id | user_id | name     
    -: | ---------: | ------: | :--------
     1 |          1 |       1 | test name
     4 |          2 |       1 | foo      
     3 |       null |       1 | one more 
     5 |       null |       1 | bar      
    

    db<>fiddle here