Search code examples
matlabsortingmatlab-table

Index of Max Value from by grouping using varfun


I have a table with Ids and Dates. I would like to retrieve the index of the max date for each Id.

My initial approach is so: varfun(@max, table, 'Grouping Variables', 'Id', 'InputVariables','Date');

This obviously gives me the date rather than the index. I noted that the max function will return both the maxvalue and maxindex when specified: [max_val, max_idx] = max(values);

How can I define an anonymous function using max to retrieve max_idx? I would then use it in the var_fun to get my result.

I'd prefer not to declare a cover function (as opposed to a anon func)over max() as: 1. I'm working in a script and would rather not create another function file 2. I'm unwilling to change my current script to a function

Thanks a million guys,


Solution

  • I don't think varfun is the right approach here, as

    varfun(func,A) applies the function func separately to each variable of the table A.

    This would only make sense if you wanted to apply it to multiple columns.

    Simple approach:

    Simply go with the loop approach: First find the different IDs using unique, then for each ID find the indices of the maximum dates. (This assumes your dates are in a numerical format which can be compared directly using max.) I did rename your variable table to t, as otherwise we would be overwriting the built-in function table.

    uniqueIds = unique(t.Id);
    for i = 1:numel(uniqueIds)
        equalsCurrentId = t.Id==uniqueIds(i); 
        globalIdxs = find(equalsCurrentId);
        [~, localIdxsOfMax] = max(t.Date(equalsCurrentId));
        maxIdxs{i} = globalIdxs(localIdxsOfMax);
    end
    

    As you mentioned your Ids are actually strings instead of numbers, you will have to change the line: equalsCurrentId = t.Id==uniqueIds(i); to

     equalsCurrentId = strcmp(t.Id, uniqueIds{i});
    

    Approach using accumarray:

    If you prefer a more compact style, you could use this solution inspired by Luis Mendo's answer, which should work for both numerical and string Ids:

    [uniqueIds, ~, global2Unique] = unique(t.Id);
    maxDateIdxsOfIdxSubset = @(I) {I(nth_output(2, @max, t.Date(I)))};
    maxIdxs = accumarray(global2Unique, 1:length(t.Id), [], maxDateIdxsOfIdxSubset);
    

    This uses nth_output of gnovice's great answer.

    Usage:

    Both above solutions will yield: A vector uniqueIds with a corresponding cell-array maxIdxs, in a way that maxIdxs{i} are the indices of the maximum dates of uniqueIds(i). If you only want a single index, even though there are multiple entries where the maximum is attained, use the following to strip away the unwanted data:

    maxIdxs = cellfun(@(X) X(1), maxIdxs);