Search code examples
matlabmatrixcellreshape

Reshape multiple columns to cell


I have an unbalanced panel of city-year level data. After padding missing values with NaNs, I import the data as columns into matlab. For example:

city     year      population       area
nyc      2000        140             22
nyc      2001        151             22
nyc      2002        NaN             22.5 
nyc      2003        NaN             22.5
boston   2000        NaN             10
boston   2001        75              11
boston   2002        76              10.5
boston   2003        77              10   
chicago  2000        43              34
chicago  2001        42              31
chicago  2002        43              NaN
chicago  2003        45              34

I can reshape each column to a matrix, in which one column represent one city. For example: mPopulation = reshape(population,[4,3]) gives

140  NaN  43
151  75   42
NaN  76   43
NaN  77   45

Similarly, mArea = reshape(area,[4,3]) gives

22     10    34
22     11    34
22.5   10.5  NaN
22.5   10    34

Note that (1) mPopulation and mArea have NaN in different positions. (2) Within a matrix, different cities have different number of missing values.

Later analysis cannot be performed on missing values, so I think I need to reshape each column to a cell, as it allows for columns with different number of rows.

Therefore, the questions are (1) how to convert the matrices into cells? (2) how to removing missing values for the same city across different cells, so each cell has the same number of rows for the same city?

The final results we want to get are two cells cPopulation and cArea:

140  75   43
151  76   42
     77   45        

and

22    11       34
22    10.5     34
      10       34          

You can see that, since population is missing for nyc-2002 and nyc-2003, we also removed the area data for nyc in these two years. After removing missing entries in this way, nyc only has 2 rows of data left, while boston and chicago have 3 rows of data. So the final data can't be stored in matrices...

EDIT

Another way to do this is, removing missing values from the main matrix first:

city     year      population       area
nyc      2000        140             22
nyc      2001        151             22
boston   2001        75              11
boston   2002        76              10.5
boston   2003        77              10   
chicago  2000        43              34
chicago  2001        42              31
chicago  2003        45              34

However, it's hard to convert each column into a cell. Since different cities have different number of rows already, we cannot use the reshape command, i.e. cannot tell matlab it's 4-by-3 for each city...


Solution

  • Before answering I should say that @cris-luengo comment is a much better idea and I would also suggest to handle NaNs rather than using cell arrays.

    However, one approach is:

    data = readtable('path\to\data');
    cleanedData = data(~any(isnan(data{:, {'population', 'area'}}), 2), :);
    [uniqueCities, ~, cityId] = unique(cleanedData.city, 'stable');
    cPopulation = arrayfun(@(x) cleanedData.population(cityId == x), 1:length(uniqueCities), 'uni', 0);
    cArea = arrayfun(@(x) cleanedData.area(cityId == x), 1:length(uniqueCities), 'uni', 0);
    

    Here we first create a new copy of the data which has rows removed if either population or area are NaN.

    Next we use the third output of the unique function to create a vector identifying the location of the unique cities in the table.

    Finally, using the arrayfun function we extract the required data, specifying that the output should be a cell array by passing the name-value pair uni and false