Search code examples
mysqlselectalter

Picking out specific values from a group in MySQL


This seems like such a simple problem, but I can't find a good solution. I'm trying to select information from a slightly misformatted table. Basically, wherever sequence=0, the person_id should actually be a company_id. This company_id then applies to all the rows which have the same group_id.

Someone thought it was a good idea to format things this way instead of simply having a company_id column, but it makes trying to select by company very difficult. It would make my programming much easier to simply add this extra column, and fix the formatting.

I want to turn something like this:

+----------+------------+-----------+----------+
| group_id | date       | person_id | sequence |
+----------+------------+-----------+----------+
|        1 | 2012-08-31 |        10 |        0 |
|        1 | 2012-08-31 |        11 |        1 |
|        1 | 2012-08-31 |        12 |        2 |
|        2 | 1999-04-16 |        10 |        0 |
|        2 | 1999-04-16 |        21 |        1 |
|        2 | 1999-04-16 |        22 |        2 |
|        2 | 1999-04-16 |        23 |        3 |
|        2 | 1999-04-16 |        24 |        4 |
|        3 | 2001-01-09 |        30 |        0 |
|        3 | 2001-01-09 |        31 |        1 |
|        3 | 2001-01-09 |        11 |        2 |
|        3 | 2001-01-09 |        12 |        3 |
+----------+------------+-----------+----------+

Into this:

+------------+----------+------------+-----------+----------+
| company_id | group_id | date       | person_id | sequence |
+------------+----------+------------+-----------+----------+
|         10 |        1 | 2012-08-31 |        11 |        1 |
|         10 |        1 | 2012-08-31 |        12 |        2 |
|         10 |        2 | 1999-04-16 |        21 |        1 |
|         10 |        2 | 1999-04-16 |        22 |        2 |
|         10 |        2 | 1999-04-16 |        23 |        3 |
|         10 |        2 | 1999-04-16 |        24 |        4 |
|         30 |        3 | 2001-01-09 |        31 |        1 |
|         30 |        3 | 2001-01-09 |        11 |        2 |
|         30 |        3 | 2001-01-09 |        12 |        3 |
+------------+----------+------------+-----------+----------+

The only way I can think of how to achieve this is with nested SELECT statements, which are very inefficient considering I have about 100M rows. It's a one time fix though, so I don't mind letting it run overnight.


Solution

  • If you permanently want to change your table to include a company_id column then do this:

    First alter the table and add the new column:

    alter table your_table add company_id int;
    

    Then update all rows to set the company to the person_id = 0 for the group:

    UPDATE your_table a 
    JOIN your_table b ON a.group_id = b.group_id  
    SET a.company_id = b.person_id
    WHERE b.sequence = 0;
    

    And finally remove the rows with sequence = 0:

    DELETE FROM your_table WHERE sequence = 0;
    

    Sample SQL Fiddle

    The end result will be:

    | group_id |       date | person_id | sequence | company_id |
    |----------|------------|-----------|----------|------------|
    |        1 | 2012-08-31 |        11 |        1 |         10 |
    |        1 | 2012-08-31 |        12 |        2 |         10 |
    |        2 | 1999-04-16 |        21 |        1 |         10 |
    |        2 | 1999-04-16 |        22 |        2 |         10 |
    |        2 | 1999-04-16 |        23 |        3 |         10 |
    |        2 | 1999-04-16 |        24 |        4 |         10 |
    |        3 | 2001-01-09 |        31 |        1 |         30 |
    |        3 | 2001-01-09 |        11 |        2 |         30 |
    |        3 | 2001-01-09 |        12 |        3 |         30 |