Search code examples
nested-loopsstata

Mata: Create a matrix that contains averages of all elements of 3 matrices


Define three row vectors, A = (1,2,3) B = (10,20,30,40) C = (100,200,300,400,500)

I want to construct a new matrix D which will be a have 3x4x5 = 60 elements and contains the averages of these elements as illustrated below:

D = 
(1+10+100)/3, (1+10+200)/3,…, (1+10+ 500)/3 \
(1+20+100)/3, (1+20+200)/3,…, (1+20+ 500)/3 \
(1+30+100)/3, (1+30+200)/3,…, (1+30+ 500)/3 \
(1+40+100)/3, (2+40+200)/3,…, (2+40+ 500)/3 \

(2+10+100)/3, (2+10+200)/3,…, (2+10+ 500)/3 \
(2+20+100)/3, (2+20+200)/3,…, (2+20+ 500)/3 \
(2+30+100)/3, (2+30+200)/3,…, (2+30+ 500)/3 \
(2+40+100)/3, (2+40+200)/3,…, (2+40+ 500)/3 \

(3+10+100)/3, (3+10+200)/3,…, (3+10+ 500)/3 \
(3+20+100)/3, (3+20+200)/3,…, (3+20+ 500)/3 \
(3+30+100)/3, (3+30+200)/3,…, (3+30+ 500)/3 \
(3+40+100)/3, (3+40+200)/3,…, (3+40+ 500)/3 \

The way it is set up in this example it will be a 12x5 matrix, but I am fine if it is a 1X60 vector or 60X1 vector.

How to do this efficiently in Mata? I am new to Mata and I had this running in Stata using multiple forval loops (in this case, there would be 3 forval loops). But this becomes very time consuming as I have up to 8 row vectors and about 120 elements in each of them.

I figured that I can use for loops in Mata and it will be much faster, but I believe if I can do this as a matrix manipulation instead of using for loops then it will be even faster. The problem is I am having a hard time visualizing how to write such a program (or if it's even possible) and any help would be highly appreciated.


Solution

  • The clever solution by @AspenChen offers huge speed gains over for loops, as shown with some testing:

    clear all
    set more off
    
    mata
    
    timer_clear() 
    
    //----- change data -----
    
    fa = 250
    fb = fa + 1
    fc = fa + 2
    
    
    //----- Method 1 -----
    
    timer_on(1)
    
    A = (1..fa)                 // 1 x fa
    B = (1..fb)*10              // 1 x fb
    C = (1..fc)*100             // 1 x fc
    
    F = J(1, cols(A) * cols(B) * cols(C), .)
    col = 0
    for (i=1; i<=cols(A); i++) {
        for (j=1; j<=cols(B); j++) {
            for (k=1; k<=cols(C); k++) {
    
                col++
                F[1,col] = A[1,i] + B[1,j] + C[1,k]
    
            }   
        }
    }
    
    timer_off(1)
    
    
    //----- Method 2 (Aspen Chen) -----
    
    timer_on(2)
    
    A = (1::fa)                 // fa x 1
    B = (1::fb)*10              // fb x 1
    C = (1::fc)*100             // fc x 1
    
    // tensor sum for A and B
    a = J(1,rows(B),1)          // 1 x fb with values of 1
    b = J(1,rows(A),1)          // 1 x fa with values of 1
    T = (A#a) + (b#B)'          // fa x fb
    
    T = vec(T)                  // fa*fb x 1
    
    // tensor sum for T and C
    c = J(1,rows(T),1)          // 1 x fa*fb with values of 1
    t = J(1,rows(C),1)          // 1 x fc with values of 1
    T = (C#c) + (t#T)'          // fc x fa*fb
    
    timer_off(2)
    
    timer()
    
    end
    

    Resulting in:

    timer report
      1.       8.78 /        1 =     8.776
      2.       .803 /        1 =      .803
    

    If the original poster still wants to use for loops due the large number of elements that will be compared, (s)he can use something along the lines of:

    <snip>
    
    larger = 0
    for (i=1; i<=cols(A); i++) {
        for (j=1; j<=cols(B); j++) {    
            for (k=1; k<=cols(C); k++) {
    
                larger = larger + (A[1,i] + B[1,j] + C[1,k] > 7)
    
            }   
        }
    }
    
    larger
    
    <snip>
    

    Edit

    Further tests with for loops only:

    clear all
    set more off
    
    mata
    
    timer_clear() 
    
    //----- change data -----
    
    fa = 500
    fb = fa + 1
    fc = fa + 2
    
    
    //----- Method 1 -----
    
    timer_on(1)
    
    A = (1..fa)                 // 1 x fa
    B = (1..fb)*10              // 1 x fb
    C = (1..fc)*100             // 1 x fc
    
    larger = 0
    for (i=1; i<=cols(A); i++) {
        for (j=1; j<=cols(B); j++) {
            for (k=1; k<=cols(C); k++) {
    
                larger = larger + (A[1,i] + B[1,j] + C[1,k] > 7)
    
            }   
        }
    }
    
    larger
    
    timer_off(1)
    
    
    //----- Method 2 (ec27) -----
    
    timer_on(2)
    
    A = (1..fa)                 // 1 x fa
    B = (1..fb)*10              // 1 x fb
    C = (1..fc)*100             // 1 x fc
    
    larger = 0
    for (i=1; i<=cols(A); i++) {
        for (j=1; j<=cols(B); j++) {
            for (k=1; k<=cols(C); k++) {
    
                placebo = A[1,i] + B[1,j] + C[1,k]
                if (placebo > 7) larger = larger + 1
    
            }   
        }
    }
    
    larger 
    
    timer_off(2)
    
    timer()
    
    end