Overview
An n×m
matrix A
and an n×1
vector Date
are the inputs of the function S = sumdate(A,Date)
.
The function returns an n×m
vector S
such that all rows in S
correspond to the sum of the rows of A
from the same date.
For example, if
A = [1 2 7 3 7 3 4 1 9
6 4 3 0 -1 2 8 7 5]';
Date = [161012 161223 161223 170222 160801 170222 161012 161012 161012]';
Then I would expect the returned matrix S
is
S = [15 9 9 6 7 6 15 15 15;
26 7 7 2 -1 2 26 26 26]';
Because the elements Date(2)
and Date(3)
are the same, we have
S(2,1)
and S(3,1)
are both equal to the sum of A(2,1)
and A(3,1)
S(2,2)
and S(3,2)
are both equal to the sum of A(2,2)
and A(3,2)
.Because the elements Date(1)
, Date(7)
, Date(8)
and Date(9)
are the same, we have
S(1,1)
, S(7,1)
, S(8,1)
, S(9,1)
equal the sum of A(1,1)
, A(7,1)
, A(8,1)
, A(9,1)
S(1,2)
, S(7,2)
, S(8,2)
, S(9,2)
equal the sum of A(1,2)
, A(7,2)
, A(8,2)
, A(9,2)
The same for S([4,6],1)
and S([4,6],2)
As the element Date(5)
does not repeat, so S(5,1) = A(5,1) = 7
and S(5,2) = A(5,2) = -1
.
The code I have written so far
Here is my try on the code for this task.
function S = sumdate(A,Date)
S = A; %Pre-assign S as a matrix in the same size of A.
Dlist = unique(Date); %Sort out a non-repeating list from Date
for J = 1 : length(Dlist)
loc = (Date == Dlist(J)); %Compute a logical indexing vector for locating the J-th element in Dlist
S(loc,:) = repmat(sum(S(loc,:)),sum(loc),1); %Replace the located rows of S by the sum of them
end
end
I tested it on my computer using A
and Date
with these attributes:
size(A) = [33055 400];
size(Date) = [33055 1];
length(unique(Date)) = 2645;
It took my PC about 1.25 seconds to perform the task.
This task is performed hundreds of thousands of times in my project, therefore my code is too time-consuming. I think the performance will be boosted up if I can eliminate the for-loop above.
I have found some built-in functions which do special types of sums like accumarray
or cumsum
, but I still do not have any ideas on how to eliminate the for-loop.
I would appreciate your help.
You can do this with accumarray
, but you'll need to generate a set of row and column subscripts into A
to do it. Here's how:
[~, ~, index] = unique(Date); % Get indices of unique dates
subs = [repmat(index, size(A, 2), 1) ... % repmat to create row subscript
repelem((1:size(A, 2)).', size(A, 1))]; % repelem to create column subscript
S = accumarray(subs, A(:)); % Reshape A into column vector for accumarray
S = S(index, :); % Use index to expand S to original size of A
S =
15 26
9 7
9 7
6 2
7 -1
6 2
15 26
15 26
15 26
Note #1: This will use more memory than your for loop solution (subs
will have twice the number of element as A
), but may give you a significant speed-up.
Note #2: If you are using a version of MATLAB older than R2015a, you won't have repelem
. Instead you can replace that line using kron
(or one of the other solutions here):
kron((1:size(A, 2)).', ones(size(A, 1), 1))