Search code examples
pythonperformancepython-2.7cpu-speed

fast python matrix creation and iteration


I need to create a matrix starting from the values of a weight matrix. Which is the best structure to hold the matrix in term of speed both when creating and iterating over it? I was thinking about a list of lists or a numpy 2D array but they both seem slow to me. What I need:

numpy array
A = np.zeros((dim, dim))
for r in range(A.shape[0]):
    for c in range(A.shape[0]):
        if(r==c):
            A.itemset(node_degree[r])
        else:
            A.itemset(arc_weight[r,c])

or

list of lists
l = []
for r in range(dim):
    l.append([])
    for c in range(dim):
        if(i==j):
            l[i].append(node_degree[r])
        else:
            l[i].append(arc_weight[r,c])

where dim can be also 20000 , node_degree is a vector and arc_weight is another matrix. I wrote it in c++, it takes less less than 0.5 seconds while the others two in python more than 20 seconds. I know python is not c++ but I need to be as fast as possible. Thank you all.


Solution

  • One thing is you shouldn't be appending to the list if you already know it's size.

    Preallocate the memory first using list comprehension and generate the r, c values using xrange() instead of range() since you are using Python < 3.x (see here):

    l = [[0 for c in xrange(dim)] for r in xrange(dim)]
    

    Better yet, you can build what you need in one shot using:

    l = [[node_degree[r] if r == c else arc_weight[r,c] 
                for c in xrange(dim)] for r in xrange(dim)]
    

    Compared to your original implementation, this should use less memory (because of the xrange() generators), and less time because you remove the need to reallocating memory by specifying the dimensions up front.