matlab performance permutation combinatorics matrix-indexing

Generate all possible combinations of pairs of triplets from 3 vectors together with their original coordinates

Given 3 row vectors A,B,C in Matlab, I want to generate the matrix D reporting all the possible combinations of pairs of triplets from A,B,C together with their original coordinates in A,B,C.

I have written a code that does what I want. As I'm trying to optimise my code as much as I can (the code should be repeated millions of times), I would like to know whether you can think of more efficient solutions. For example, in my code I am not pre-allocating the matrix D because I don't know how to get the index of each pair of triplets and this is not efficient.

The code below will explain it better:

clear 
A=[1 2];
B=[-4 -2 5];
C=[8 9 -3 0];

sA=size(A,2);
sB=size(B,2);
sC=size(C,2);
sT=sA*sB*sC;

%Generate the matrix D of dimension [sT*(sT-1)/2]x[12]
%reporting all the possible combinations of pairs of triplets from A,B,C
%together with their original coordinates in A,B,C

[ca, cb, cc] = ndgrid(A, B, C);
T = [ca(:), cb(:), cc(:)];  %matrix of dimension sTx3 reporting all the possible triplets 
                            %from A,B,C

[ca, cb, cc] = ndgrid(1:sA, 1:sB, 1:sC);
Tcoord = [ca(:), cb(:), cc(:)];  %matrix of dimension sTx3 reporting the coordinates of all 
                                 %the possible triplets from A,B,C

D=[];
for w=1:sA*sB*sC
    for r=w+1:sA*sB*sC 
        D=[D; T(w,:) T(r,:) Tcoord(w,:) Tcoord(r,:)];
    end
end

Solution

The last nested for loop that populates the matrix D can be much more efficient. The OP is spot on in their statement:

"For example, in my code I am not pre-allocating the matrix D because I don't know how to get the index of each pair of triplets and this is not efficient."

We can vectorize much of the work in those loops by noticing that there is a pattern which the OP alludes to in their comments about the final size of the matrix D (i.e. Generate the matrix D of dimension [sT*(sT-1)/2]x[12]). That first dimension will look familiar to anybody that has worked with series and sequences quite a bit. It is the formula of Triangle Numbers.

With this in mind, we can see that in the final result, the first 3 columns (and columns 7 through 9) repeats 23 times, then 22 times, and so on while the columns 4 through 6 (and 10 through 12) are the last 23 rows of T/Tcoord, the last 22 rows of T/Tcoord and so on. In code we have:

D1 = zeros(sT * (sT - 1) / 2, 12);
s = 1;
e = sT - 1;

for w = 1:(sT - 1)
    D1(s:e,[1:3,7:9]) = repmat([T(w,:),Tcoord(w,:)], sT - w, 1);
    D1(s:e,[4:6,10:12]) = [T((w+1):sT,:),Tcoord((w+1):sT,:)];
    s = e + 1;
    e = e + (sT - (w + 1));
end

And running each method 200 times using tic and toc, we see that we have nearly a 35% increase in efficiency.

% OP's setup code goes here

tic
for i=1:200
    D=[];
    for w=1:sA*sB*sC
        for r=w+1:sA*sB*sC
            D=[D; T(w,:) T(r,:) Tcoord(w,:) Tcoord(r,:)];
        end
    end
end
toc

tic
for i = 1:200
    D1 = zeros(sT * (sT - 1) / 2, 12);
    s = 1;
    e = sT - 1;

    for w = 1:(sT - 1)
        D1(s:e,[1:3,7:9]) = repmat([T(w,:),Tcoord(w,:)], sT - w, 1);
        D1(s:e,[4:6,10:12]) = [T((w+1):sT,:),Tcoord((w+1):sT,:)];
        s = e + 1;
        e = e + (sT - (w + 1));
    end
end
toc

% Gives same result
isequal(D, D1)

% Timing for 200 runs on 24 total combinations
Elapsed time is 2.09613 seconds.
Elapsed time is 1.35988 seconds.
ans = 1

If we make the input vectors larger, we see an even greater improvement in efficiency. Below are the results of running over each method 15 times on the following vectors:

A=[1 2 3 4 23];
B=[-4 -2 5 74];
C=[8 9 -3 0];

% Timing for 15 run on 80 total combinations
Elapsed time is 4.00448 seconds.
Elapsed time is 0.379919 seconds.
ans = 1

That is more than 10x time faster. The gap grows exponentially as the size of your input vectors gets larger.

A=[1 2 3 4 23];
B=[-4 -2 5 74 28];
C=[8 9 -3 0 -100 -5];

% Timing for 1 run on 150 total combinations
Elapsed time is 3.63065 seconds.
Elapsed time is 0.0481789 seconds.
ans = 1

That is about 75 times faster!!!

Update

The OP has given a far superior answer in the comments:

indices=nchoosek((1:1:sT),2);
D=[T(indices(:,1),:) T(indices(:,2),:) Tcoord(indices(:,1),:) Tcoord(indices(:,2),:)];

Here is the code I benchmarked with:

clear 
A=[1 2 3 4 23 24 25 26];
B=[-4 -2 5 74 28 10 11 12 13];
C=[8 9 -3 0 -100 -5 60 120];

sA=size(A,2);
sB=size(B,2);
sC=size(C,2);
sT=sA*sB*sC;

tic
for i = 1:10
    [ca, cb, cc] = ndgrid(A, B, C);
    T = [ca(:), cb(:), cc(:)];
    [ca, cb, cc] = ndgrid(1:sA, 1:sB, 1:sC);
    Tcoord = [ca(:), cb(:), cc(:)];

    D1 = zeros(sT * (sT - 1) / 2, 12);
    s = 1;
    e = sT - 1;

    for w = 1:(sT - 1)
        D1(s:e,[1:3,7:9]) = repmat([T(w,:),Tcoord(w,:)], sT - w, 1);
        D1(s:e,[4:6,10:12]) = [T((w+1):sT,:),Tcoord((w+1):sT,:)];
        s = e + 1;
        e = e + (sT - (w + 1));
    end
end
toc

tic
for i = 1:10
    indices=nchoosek((1:1:sT),2);
    D=[T(indices(:,1),:) T(indices(:,2),:) Tcoord(indices(:,1),:) Tcoord(indices(:,2),:)];
end
toc

isequal(D, D1)

And here are the results:

% Timing for 10 runs on 576 total combinations
Elapsed time is 1.9834 seconds.
Elapsed time is 0.13818 seconds.
ans = 1

The improved solution I provided is better than the original by a good margin, but is no match for the updated solution by the OP. It is orders faster and quite elegant, I might add.