Search code examples
mysqlaggregate

Group, then "ungroup" rows with additional columns giving the group information


Context

Let's say we have this table data (see at the end for a ready-ot-use request to create it):

+--+-----+-----+--------+
|id|name |color|shape   |
+--+-----+-----+--------+
|1 |john |blue |square  |
|2 |mary |green|square  |
|3 |anna |red  |triangle|
|4 |bob  |blue |square  |
|5 |susan|blue |square  |
|6 |frank|red  |triangle|
+--+-----+-----+--------+

With this request, it's possible to group rows by color and shape, and more importantly to add aggregation information:

SELECT
    GROUP_CONCAT(name) AS names,
    color,
    shape,
    COUNT(*) AS nb_duplicates
FROM temp_users
GROUP BY color, shape;

Result:

+--------------+-----+--------+-------------+
|names         |color|shape   |nb_duplicates|
+--------------+-----+--------+-------------+
|john,bob,susan|blue |square  |3            |
|mary          |green|square  |1            |
|anna,frank    |red  |triangle|2            |
+--------------+-----+--------+-------------+

Problem

But how is it possible to "ungroup" the rows, in order to have:

  • one row per user (at least with its id, the rest can be joined);
  • the information added after the grouping, especially nb_duplicates and a unique group id (maybe auto-incremented) ?

Expected output

+--+-----+-----+--------+-------------+------------------+
|id|name |color|shape   |nb_duplicates|duplicate_group_id|
+--+-----+-----+--------+-------------+------------------+
|1 |john |blue |square  |3            |1                 |
|2 |mary |green|square  |1            |2                 |
|3 |anna |red  |triangle|2            |3                 |
|4 |bob  |blue |square  |3            |1                 |
|5 |susan|blue |square  |3            |1                 |
|6 |frank|red  |triangle|2            |3                 |
+--+-----+-----+--------+-------------+------------------+

Similar question

I found a similar question Mysql: count, group by yet return all results, I tried the proposition:

SELECT
    u.*,
    dups.nb_duplicates
FROM temp_users u
INNER JOIN (
    SELECT
        u2.color,
        u2.shape,
        COUNT(*) AS nb_duplicates
    FROM temp_users u2
    GROUP BY color, shape
) AS dups ON u.color = dups.color AND u.shape = dups.shape;

But I got this error from MySql:

[HY000][1137] Can't reopen table: 'u'

Example table creation request

Just for those who want to quicky reproduce the table:

DROP TABLE IF EXISTS temp_users;
CREATE TEMPORARY TABLE temp_users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(20),
    color VARCHAR(20),
    shape VARCHAR(20)
);
INSERT INTO temp_users(name, color, shape) VALUES
('john',  'blue',  'square'),
('mary',  'green', 'square'),
('anna',  'red',   'triangle'),
('bob',   'blue',  'square'),
('susan', 'blue',  'square'),
('frank', 'red',   'triangle');

Solution

  • SELECT *,
           COUNT(*) OVER (PARTITION BY color, shape) nb_duplicates,
           DENSE_RANK() OVER (ORDER BY color, shape) duplicate_group_id
    FROM temp_users
    ORDER BY id;
    
    id name color shape nb_duplicates duplicate_group_id
    1 john blue square 3 1
    2 mary green square 1 2
    3 anna red triangle 2 3
    4 bob blue square 3 1
    5 susan blue square 3 1
    6 frank red triangle 2 3

    fiddle