This seems like such a simple problem, but I can't find a good solution. I'm trying to select information from a slightly misformatted table. Basically, wherever sequence=0
, the person_id
should actually be a company_id
. This company_id
then applies to all the rows which have the same group_id
.
Someone thought it was a good idea to format things this way instead of simply having a company_id
column, but it makes trying to select by company very difficult. It would make my programming much easier to simply add this extra column, and fix the formatting.
I want to turn something like this:
+----------+------------+-----------+----------+
| group_id | date | person_id | sequence |
+----------+------------+-----------+----------+
| 1 | 2012-08-31 | 10 | 0 |
| 1 | 2012-08-31 | 11 | 1 |
| 1 | 2012-08-31 | 12 | 2 |
| 2 | 1999-04-16 | 10 | 0 |
| 2 | 1999-04-16 | 21 | 1 |
| 2 | 1999-04-16 | 22 | 2 |
| 2 | 1999-04-16 | 23 | 3 |
| 2 | 1999-04-16 | 24 | 4 |
| 3 | 2001-01-09 | 30 | 0 |
| 3 | 2001-01-09 | 31 | 1 |
| 3 | 2001-01-09 | 11 | 2 |
| 3 | 2001-01-09 | 12 | 3 |
+----------+------------+-----------+----------+
Into this:
+------------+----------+------------+-----------+----------+
| company_id | group_id | date | person_id | sequence |
+------------+----------+------------+-----------+----------+
| 10 | 1 | 2012-08-31 | 11 | 1 |
| 10 | 1 | 2012-08-31 | 12 | 2 |
| 10 | 2 | 1999-04-16 | 21 | 1 |
| 10 | 2 | 1999-04-16 | 22 | 2 |
| 10 | 2 | 1999-04-16 | 23 | 3 |
| 10 | 2 | 1999-04-16 | 24 | 4 |
| 30 | 3 | 2001-01-09 | 31 | 1 |
| 30 | 3 | 2001-01-09 | 11 | 2 |
| 30 | 3 | 2001-01-09 | 12 | 3 |
+------------+----------+------------+-----------+----------+
The only way I can think of how to achieve this is with nested SELECT statements, which are very inefficient considering I have about 100M rows. It's a one time fix though, so I don't mind letting it run overnight.
If you permanently want to change your table to include a company_id
column then do this:
First alter the table and add the new column:
alter table your_table add company_id int;
Then update all rows to set the company to the person_id = 0
for the group:
UPDATE your_table a
JOIN your_table b ON a.group_id = b.group_id
SET a.company_id = b.person_id
WHERE b.sequence = 0;
And finally remove the rows with sequence = 0
:
DELETE FROM your_table WHERE sequence = 0;
The end result will be:
| group_id | date | person_id | sequence | company_id |
|----------|------------|-----------|----------|------------|
| 1 | 2012-08-31 | 11 | 1 | 10 |
| 1 | 2012-08-31 | 12 | 2 | 10 |
| 2 | 1999-04-16 | 21 | 1 | 10 |
| 2 | 1999-04-16 | 22 | 2 | 10 |
| 2 | 1999-04-16 | 23 | 3 | 10 |
| 2 | 1999-04-16 | 24 | 4 | 10 |
| 3 | 2001-01-09 | 31 | 1 | 30 |
| 3 | 2001-01-09 | 11 | 2 | 30 |
| 3 | 2001-01-09 | 12 | 3 | 30 |