I have a cell-array with 600 000 rows and 5 columns. In the following example I only present 3 different codes and a period of 5 years. Input:
c1 c2 c3 c4 c5
1 2006 20060425 559 'IA'
1 2007 20070129 559 'LO'
1 2007 20070826 559 'VC'
1 2008 20080825 34 'VP'
1 2009 20090116 34 'ZO'
4 2007 20070725 42 'OI'
4 2008 20080712 42 'TF'
4 2008 20080428 42 'XU'
11 2007 20070730 118 'AM'
11 2008 20080912 118 'HK'
11 2009 20090318 2 'VT'
11 2010 20100121 2 'ZZ'
I would like to obtain a new variable that gives for each code (c1
) the years in which c1
appears in the sample and the corresponding c4
value. For instance:
Output:
x 2006 2007 2008 2009 2010
1 559 559 34 34 -
4 - 42 42 - -
11 - 118 118 2 2
To get to my cell-array, this is the code I used so far:
a1=T_ANNDAT3;
a2=I{:,7};
a3=I{:,6};
a4=I{:,16};
a5=I{:,1};
TRACK_AN = [num2cell([a2 a1 a4 a3]) a5];
TRACK_AN(cell2mat(TRACK_AN(:,1))==0,:)=[];
[~,indTA,~] = unique(strcat(TRACK_AN(:,1),TRACK_AN(:,2),TRACK_AN(:,4),TRACK_AN(:,5)));
TRACK_AN = TRACK_AN(indTA,:);
Can someone help?
You can calculate this very easily using unique
as what you have seen. The key is to use the 'rows'
flag as the second parameter into unique
so you can figure out the unique row entries for the matrix. We only need the first, second and fourth columns of the matrix for this process so we can just subset those columns out. You also need to use the additional output parameters of unique
so we can figure out where exactly the unique rows appear in the original cell array. This is the key property we need for the next part of the algorithm.
After you find the unique cell array from the first unique
call, we apply unique
two more times - One for the column of c1
and one more for the column of c2
so we can index the ID and the year. We will use the third output parameter of unique
so that we can assign each unique number within each column into a unique ID. We then use accumarray
to create the final matrix that you see above, binning the values in the fourth column given the first column serving as rows and the second column serving as columns for this final matrix. In other words:
%// Create cell array as per your example
C = {1 2006 20060425 559 'IA'
1 2007 20070129 559 'LO'
1 2007 20070826 559 'VC'
1 2008 20080825 34 'VP'
1 2009 20090116 34 'ZO'
4 2007 20070725 42 'OI'
4 2008 20080712 42 'TF'
4 2008 20080428 42 'XU'
11 2007 20070730 118 'AM'
11 2008 20080912 118 'HK'
11 2009 20090318 2 'VT'
11 2010 20100121 2 'ZZ'};
%// Get only those columns that are relevant
%// These are the first, second and fourth columns
Cmat = unique(cell2mat(C(:,[1 2 4])), 'rows');
%// Bin each of the first and second columns
%// Give them a unique ID per unique number
[~,~,ind] = unique(Cmat(:,1));
[~,~,ind2] = unique(Cmat(:,2));
%// Use accumarray to create your matrix
%// Edit - Thanks to Amro
%// Any values that are missing replace with NaN
finalMat = accumarray([ind ind2], Cmat(:,3), [], [], NaN);
The output is thus:
finalMat =
559 559 34 34 NaN
NaN 42 42 NaN NaN
NaN 118 118 2 2
I replaced those values that were missing with NaN
to signify the missing values.
Hope this helps!