Better method to calculate half of symmetrical numpy matrix?

Each cell of my matrix needs to be a score calculated by an expensive function. The matrix is symmetrical, this is the best method I could think of to populate each cell.

num_cases = len(case_dictionary.keys())  # num_cases = 10
SmallMatrix = np.zeros((num_cases,num_cases))

for CasesX in range(0,num_cases):
    for CasesY in range(CasesX,num_cases):
        SmallMatrix[CasesX,CasesY] = 1

returns:

array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

easy enough...

However, when the Matrix is bigger and the computation is expensive: Is the nested for loop the most efficient solution?

num_cases = len(case_dictionary.keys())  # 100000
BigMatrix = np.zeros((num_cases,num_cases))

for CasesX in range(0,num_cases):
    for CasesY in range(CasesX,num_cases):
        BigMatrix[CasesX,CasesY] = ExpensiveFunction()

slow... due to my function, or the loop?

EDIT

Continually working with pairwise data so I went back and tried to work with @hpaulj solution. I'm not knowledgeable enough to understand why testUpper() is faster?

def testUpper(func):
    num_cases = 100
    BigMatrix = np.zeros((num_cases,num_cases))

    upper = np.triu_indices_from(BigMatrix)

    BigMatrix[upper] = ExpensiveFunction()

benchmarked @unutbu test function from below, against the numpy version:

In [8]: %timeit test(ExpensiveFunction)
        1 loops, best of 3: 11.1 s per loop

In [9]: %timeit testUpper(ExpensiveFunction)
        1000 loops, best of 3: 2.03 ms per loop

Solution

Here is a simple experiment which shows that the bottle neck is more likely to be ExpensiveFunction:

import time

def SimpleFunction():
    return 1

def ExpensiveFunction():
    time.sleep(0.001)
    return 1

def test(func):
    num_cases = 100
    BigMatrix = np.zeros((num_cases,num_cases))

    for CasesX in range(0,num_cases):
        for CasesY in range(CasesX,num_cases):
            BigMatrix[CasesX,CasesY] = func()

In [84]: %timeit test(ExpensiveFunction)
1 loops, best of 3: 5.48 s per loop

In [85]: %timeit test(SimpleFunction)
1000 loops, best of 3: 890 µs per loop

The two timeit runs are the same except for the function being called. When func is SimpleFunction, populating BigMatrix takes less than 1ms. But when func is ExpensiveFunction, populating BigMatrix takes over 5s.

So the double for-loop is probably not the bottle neck; ExpensiveFunction is. You can try it with your actual code to make sure. If it does turn out that ExpensiveFunction is the bottleneck then you don't need to bother optimizing the double-loop since even if there is a faster way to populate BigMatrix -- even if you could cut the time cost to zero -- you would (in the above case) only save at most 890 us while the overall program would still take over 5 seconds.