arraysmatlabif-statementsearch

Other ways to efficiently search within an array


I started with this code except the comments:

mol1...;mol2...;
r1 = size(e, 1);%number of candidates for aligned amino acid
r0 = size(e0,1);%number of candidates for reference amino acid
for i = 1 : r1
    %if e(i, 1) > 4
        for j = 1 : r0
            %if e0(j, 1) > 4
                if e(i, 1) == e0(j, 1)
                    eI(i, j) = e(i, 1);%number of atoms matched
                    eT(i, j) = abs((e(i, 2) - e0(j, 2)) / e0(j, 2) * 100);
                    end
                %end
            end
        %end
    end

mol1 and mol2 are the combinations of selecting atoms and the total number: ex for a 3 atom molecule (1, 1,0,0) (1, 0,1,0) ... (3, 1,1,1). e and e0 are some numbers regarding geometry.

When I get to more atoms the size of the array can be 200 000. I thought that it wouldn't hurt to lose combinations of less than 5 atoms, but the code did not run faster. So the problem is with the ifs. Next I tried to delete combinations <5, keep the indexes and rebuild the initial array afterwards:

e (:, 7) = find(e (:, 1));
e0(:, 7) = find(e0(:, 1));
e (e (:, 4) < 5, :) = [];
e0(e0(:, 4) < 5, :) = [];
...

This halved the time. I tic-toc-ed the code of 500 lines and the problem is here. It would take 2 years for 300 molecules (that I have chosen until now) and I would like to ad some more (20000).

So what other ways of scraping atoms in my array can you guys think of? Maybe I should decide for each size of molecules (15 atoms can scrap results of 5 atoms; 8-4). If changing the precision would reduce this time, how should I do it?


Version 2: cannot submit - webpage won't allow - see comments


Version 3 (50x faster than v2):

e(:,7)=find(e (:, 1));
e0(:,7)=find(e0 (:, 1));
[val,ia,ib]=intersect(e(:, 1), e0(:, 1));
for i = 1 : size(ia)
for j = 1 : size(ib)
eI(e(ia(i), 7), e0(ib(j), 7)) = e(ia(i), 1);
eT(e(ia(i), 7), e0(ib(j), 7)) = abs((e(ia(i), 2) - e0(ib(j), )) / e0(ib(j), 2) * 100);
end
end

Solution

  • Here is a vectorized form that computes the result without any for loop:

    e_eq_e0 = e(:, 1) == e0(:, 1).';
    
    eI = e_eq_e0 .* e(:, 1);
    eT = e_eq_e0 .* abs((e(:, 2) ./ e0(:, 2).'  - 1) * 100);
    

    However the main problem in your code is that you don't pre-allocate the matrices eI and eT before using them:

    eI = zeros(r1, r0);
    eT = zeros(r1, r0);
    for i = 1 : r1
    ....