Search code examples
matlabuniqueoverlap

Fraction of overlapping members across multiple vectors


I have a cell M with n number of cells, with each cell containing several unique numbers, something like this:

{[15 16 21 26 28 145],[2 5 8 9 15],[20 24 27],[10 11 15 8 6 258 74 1],...}

Some of these values appear in more than 1 cell. I would like to calculate the fraction of overlapping values across these cells. For instance, with the 4 cells above, I have 19 unique numbers, and 2 of them belong to more than 1 cell: 15 and 8. Thus, the fraction of overlapping cell is 2/19 = .105. Note that the number of cells in M can vary and thus the number of unique numbers in M also vary as well. Does anyone have any suggestion on how to do this efficiently? I've tried horzcat to concatenate the cells within M then used unique but didn't quite get what I want.


Solution

  • Using the output of the hist() function is useful here.

    M = {[15 16 21 26 28 145],[2 5 8 9 15],[20 24 27],[10 11 15 8 6 258 74 1]};
    % Bring data to one matrix.
    M2 = cell2mat(M);
    % Build a histogram from the data with a bin on each unique element. The
    % first output of hist is the number of elements in each bin.
    a = hist(M2,unique(M2));
    % Calculate the overlap by dividing the number of elements that occur more
    % than once by the total number of elements.
    overlap = sum(a>1)/numel(a);