Search code examples
matlabmergedatasetunique

Merging rows with the same date in a Matlab table


I'm writing this post in the hope that someone can help me out with a little problem that requires a good solution. I have a table with the following structure:

column1 = datenum | column2 = country | column3 = value1 | column4 = value2

Let's say I load the following dataset (I manually added the first column to better show you the rows offsets and should not be considered as part of the dataset):

1   736561  'USA'       2752    251
2   736561  'USA'       184     53
3   736561  'USA'       40      0
4   736572  'England'   1       0
5   736573  'USA'       1       0
6   736575  'USA'       1       0
7   736576  'England'   1       0
8   736577  'USA'       2       0
9   736580  'USA'       1       1
10  736581  'USA'       1       0
11  736582  'USA'       1       0
12  736599  'USA'       1       0
13  736619  'USA'       5       0
14  736619  'France'    1       1
15  736683  'USA'       1       0

Now, what I need to to is to merge together the rows with the same date. As you can see, this is the case for the rows in the intervals 1:3 and 13:14. I have to do this following a few simple criteria:

  • if the country in the duplicate rows is always the same, the final row should still show that country, otherwise it must show "Multiple";
  • value1 and value2 of the final row must be the sum of value1 and value2 of the duplicate rows.

Following those criteria, the table in the above example should become (once again, the first column is here in order to simplify data visualization and should not be taken into account by the code):

1   736561  'USA'       2976    304
2   736572  'England'   1       0
3   736573  'USA'       1       0
4   736575  'USA'       1       0
5   736576  'England'   1       0
6   736577  'USA'       2       0
7   736580  'USA'       1       1
8   736581  'USA'       1       0
9   736582  'USA'       1       0
10  736599  'USA'       1       0
11  736619  'Multiple'  6       1
12  736683  'USA'       1       0

Solution

  • After many debugging sessions, I came up with this solution:

    data = cell2table(data,'VariableNames',{'Date','Country','Value1','Value2'});
    
    [dat_uni,~,dat_idx] = unique(data.Date);
    
    [cty_uni,~,cty_idx] = unique(data.Country);
    cty_uni = [cty_uni; 'Multiple'];
    
    cty_tmp = accumarray(dat_idx,cty_idx,[max(dat_idx) 1],@(x) {unique(x)});
    mult = cellfun(@(x) length(x) > 1,cty_tmp);
    cty_tmp{mult} = max(cty_idx) + 1;
    cty_tmp = cat(1,cty_tmp{:});
    
    data_new = table(dat_uni,cty_uni(cty_tmp),accumarray(dat_idx,data.Value1),accumarray(dat_idx,data.Value2));