I have a table with Ids and Dates. I would like to retrieve the index of the max date for each Id.
My initial approach is so: varfun(@max, table, 'Grouping Variables', 'Id', 'InputVariables','Date');
This obviously gives me the date rather than the index. I noted that the max function will return both the maxvalue and maxindex when specified: [max_val, max_idx] = max(values);
How can I define an anonymous function using max to retrieve max_idx? I would then use it in the var_fun to get my result.
I'd prefer not to declare a cover function (as opposed to a anon func)over max() as: 1. I'm working in a script and would rather not create another function file 2. I'm unwilling to change my current script to a function
Thanks a million guys,
I don't think varfun
is the right approach here, as
varfun(func,A)
applies the functionfunc
separately to each variable of the tableA
.
This would only make sense if you wanted to apply it to multiple columns.
Simply go with the loop approach: First find the different IDs using unique
, then for each ID find the indices of the maximum dates. (This assumes your dates are in a numerical format which can be compared directly using max
.)
I did rename your variable table
to t
, as otherwise we would be overwriting the built-in function table
.
uniqueIds = unique(t.Id);
for i = 1:numel(uniqueIds)
equalsCurrentId = t.Id==uniqueIds(i);
globalIdxs = find(equalsCurrentId);
[~, localIdxsOfMax] = max(t.Date(equalsCurrentId));
maxIdxs{i} = globalIdxs(localIdxsOfMax);
end
As you mentioned your Ids
are actually strings instead of numbers, you will have to change the line: equalsCurrentId = t.Id==uniqueIds(i);
to
equalsCurrentId = strcmp(t.Id, uniqueIds{i});
accumarray
:If you prefer a more compact style, you could use this solution inspired by Luis Mendo's answer, which should work for both numerical and string Ids:
[uniqueIds, ~, global2Unique] = unique(t.Id);
maxDateIdxsOfIdxSubset = @(I) {I(nth_output(2, @max, t.Date(I)))};
maxIdxs = accumarray(global2Unique, 1:length(t.Id), [], maxDateIdxsOfIdxSubset);
This uses nth_output
of gnovice's great answer.
Both above solutions will yield: A vector uniqueIds
with a corresponding cell
-array maxIdxs
, in a way that maxIdxs{i}
are the indices of the maximum dates of uniqueIds(i)
.
If you only want a single index, even though there are multiple entries where the maximum is attained, use the following to strip away the unwanted data:
maxIdxs = cellfun(@(X) X(1), maxIdxs);