Search code examples
matlabmatrixcountmember

Extract "n" rows for each ID in a matrix


I have a matrix which has multiple rows per ID. I need to extract only say..4 LAST rows per ID. ID cols consists of 2 Cols:

mat = [ ...
        2222 1   734771 0.11
        2222 1   734772 0.22
        2222 1   734773 0.33
        2222 1   734774 0.44
        2222 1   734775 0.55
        5555 3   734771 0.11
        5555 1   734772 0.12
        5555 1   734773 0.13
      ] ;

answer --> % [ 2222 1 734772 0.22 ; 2222 1 734773 0.33 ; 2222 1 734774 0.44 ; 2222 1 734775 0.55]

I'll appreciate a vectorized approach.


Solution

  • Here is a efficient solution using UNIQUE and ACCUMARRAY functions:

    %# sample matrix, sorted by first column
    mat = [randi([1 5],[20 1]) (1:20)' rand(20,1)];   %'
    mat = sortrows(mat);
    
    %# extract last N rows from each unique ID
    N = 4;
    [~,~,subs] = unique( mat(:,1) );                 %# index into unique values
    fcn = @(x){ x(max(1,end-N+1):end) };             %# at most last N values
    ind = accumarray(subs, 1:numel(subs), [max(subs) 1], fcn);
    ind = cell2mat( ind(cellfun(@numel,ind) >= 4) ); %# keep only those with 4+ count
    result = mat(ind,:)
    

    Using the third output of the UNIQUE function, we get for each row the index into the list of unique IDs. Then we distribute the row indices into bins (cell array) based on those subscripts. We take the last four indices from each bin, and filter those with less than 4 occurrences. We then combine them all into one vector of row indices. Finally we use it to get the corresponding rows from the original matrix.

    Using the example above, we generate the following matrix:

    mat =
                1            2      0.70199
                1            6      0.46313
                1            7      0.98821
                1           12      0.15645
                1           13      0.67037
                1           16      0.86966
                2            8      0.63491
                2            9     0.076501
                2           15      0.55076
                2           17      0.44727
                2           19      0.30587
                3            5      0.91502
                3           10      0.97322
                3           20      0.48231
                4            3      0.45633
                4            4      0.12363
                4           11      0.18319
                4           14      0.36045
                5            1      0.36708
                5           18      0.63084
    

    and the result was:

    result =
                1            7      0.98821
                1           12      0.15645
                1           13      0.67037
                1           16      0.86966
                2            9     0.076501
                2           15      0.55076
                2           17      0.44727
                2           19      0.30587
                4            3      0.45633
                4            4      0.12363
                4           11      0.18319
                4           14      0.36045