and thank you for showing interest in my post :)
Let me quickly describe the data I'm working with:
I'm important data from excel into my matlab script so that I end up with 3 vectors. These are:
Now, what I want to achieve:
create a vector that counts the number of times each row in "sector" is appearing in "FIT_txt". E.g.: say the first entry in "sector" is "AB10", then I want to create a vector that counts the number of times "AB10" occurs in "FIT_txt"; this should be saved in the first row of the new vector. The second row of the new vector counts the appearances of row 2 from "sector" (say "AB11") in "FIT_txt", etc.
create a vector that adds all the numbers in "FIT_num" that occur in the same row as the entries in "FIT_txt" that correspond to a "sector" entry, and sort it accordingly. E.g.: "AB10" in row 1 of "sector" occurs 3 times in "FIT_txt" - it occurs in row 2, 500 & 2000. I want to add up rows 2, 500 & 2000 in "FIT_num" and put them into the first row of the new vector.
Now, this is a simplification of my problem; essentially, I am using a lot more data and I'm repeating this process to find a much larger number of new vectors/matrices. With regard to the simplification, I have managed to solve the problem by doing this:
units = zeros(length(sector),1);
installed = zeros(length(sector),1);
for a = 1:length(sector)
for z = 1:length(FIT_txt(:,1))
if strcmp((FIT_txt(z,1)),(sector(a)))==1
units(a,1) = units(a,1) + 1;
installed(a,1) = installed(a,1) + FIT_num(z,1);
end
end
end
Unfortunately, I fear this is very inefficient and takes way too long to calculate.
I have managed to achieve decent results for my first question (counting the appearances) using this:
units = zeros(length(sector),1);
for a = 1:length(sector)
units(a,1) = sum(strcmp((FIT_txt(:,1)),(sector(a)))
end
That works fine (although still taking a bit longer than I would like), however, I don't know how to solve my second question (counting the values from "FIT_num" that correspond).
I would appreciate if you could assist me in finding a solution that is as efficient as possible in solving my problem.
Thank you very much in advance.
John
Let's solve this step by step: I'm assuming FIT_txt
is a Nx1
cell array and sector
is a Mx1
cell array.
First find the rows in FIT_txt where sector(ii)
appears in, and also how many times:
ii=1;
sector_occurrence = strfind(FIT_txt,sector(ii));
This gives you N
cells for each row of FIT_txt with in each cell the actual location at each row where sector(ii)
is found. You just need the count per row, so use numel to get just that:
cellfun(@numel,sector_occurrence );
You can do this for every element of sector
with a loop or with arrayfun:
sector_occurrence = cell2mat(arrayfun(@(ii) cellfun(@numel,strfind(FIT_txt,sector{ii})),1:numel(sector),'uni',false'));
Now you have a NxM matrix. The element at row i and column j tells how many times sector(j)
occurs in FIT_txt(i)
.
*reading your comments... Aah, but if a sector is found, FIT_txt is exactly equal to the sector, which allows you the simplify all the above this with strcmp and arrayfun:
sector_occurrence = cell2mat(arrayfun(@(sectorii) strcmp(FIT_txt,sectorii), sector', 'uni',false))
Now onto the summation:
It's just a vector product of the occurrence vector with FIT_num
, and you can get them all in one call by multiplying the sector_occurrence
matrix with FIT_num
:
sumresult = sector_occurrence'*FIT_num;
note the transpose operator '
because sector_occurrence
was defined as NxM
.
>> sector=[{'AB10'} ; {'b'}];
>> FIT_txt=[{'AB10'} ; {'a'} ; {'b'} ; {'ZX5b'} ; {'AB10'} ; {'b'}];
>> FIT_num = (1:6)';
>> sector_occurrence = strfind(FIT_txt,sector(ii))
sector_occurrence =
1 0
0 0
0 1
0 0
1 0
0 1
>> sumresult = sector_occurrence'*FIT_num
sumresult =
6
9
Of course, it has to be an exact match (which you said it was) b
won't be equal to B
and also not to bb
, only b
.
sector_occurrence
is a logical
array, so it doesn't need that much memory. But when executing the multiplication with FIT_num
, it gets converted to float
, which needs 8 times as much space. The end result is a small (in memory) vector, but the intermediary process can melt your pc. You can avoid this by doing the multiplication in a loop:
sumresult=NaN(numel(sector),1); %preallocation is a good thing
for ii=1:numel(sector)
sumresult(ii)=sector_occurrence(:,ii)'*FIT_num;
end