Search code examples
pythonmultidimensional-arraynested-listsprocessing-efficiency

How do I efficiently change entries in a matrix/nested list?


For a monte carlo simulation, I have 500 instances of Person, each which have a location (cartesian coordinates) as attribute. Throughout the simulation, I need to access the distances between pairs of two persons multiple times. I already have defined a function to compute the (cartesian) distance between pairs of two persons (for simplicity lets call this function 'distance(loc1,loc2)'). I am interested in making the script computationally more efficient. A first step I took is creating a symmetric matrix which stores the distances rather than computing the distance each time it is needed. For now I decided to make it a nested numpy array; I could change it to a nested list or something else if it makes things easier. (I made the matrix using an adapted version of this: Python: Generating a symmetric array with list comprehension) The matrix of distances looks something like this:

np.array([[0, 6.44177991, 2.74762143, 3.47162016, 2.0645646 ],
       [6.44177991, 0, 1.59860905, 8.99027864, 2.58449879],
       [2.74762143 , 1.59860905, 0, 2.06833575, 8.53594684],
       [3.47162016, 8.99027864 , 2.06833575, 0, 6.76594943],
       [2.0645646, 2.58449879, 8.53594684, 6.76594943, 0]])

During the simulation, the locations of people (ocasionally) change. When this happens, I would need to 'update' the matrix. I currently approach this using a for loop (see below), but I was wondering if there was a more efficient approach to replace the values.

#note: population is a list containing the 500 Person entities
#e.g. if person5's location changes:
p_id = 5

for i in range(len(population)):
    if i!=p_id:    
        new_distance=distance(population[i].location, population[p_id].location)
        distance_matrix[p_id][i] = new_distance
        distance_matrix[i][p_id] = new_distance


Solution

  • Although you didn't ask for it, you could find the distance between points in a very simple way using Numpy's broadcasting. This is implemented via the np.newaxis function, which simply add another (empty) dimension to the array, so that the arithmetic operation is performed on the broadcasted array (which saves memory). In the code below there is no need for loops since we simply index only the rows (and columns) of the distance matrix that are changed (corresponding to the modified persons locations) in every step.

    I hope it is indeed more efficient than your initial approach (haven't timed it), although perhaps further optimization could be done (last two lines bug me a little, not gonna lie).

    Note that in the test code below I assumed your Person coordinates are displayed in the random arrays x and y, while p_id indicates the indices of the persons that have changed locations.

    import numpy as np
    
    np.random.seed(1)         # Fix seed for reproducibility
    x = np.random.random(10)  # Test x-coordinates array
    y = np.random.random(10)  # Test y-coordinates array
    
    # Calculate distances
    dists = np.sqrt((x - x[:,np.newaxis])**2 + (y - y[:,np.newaxis])**2)
    
    # Test boolean array of persons IDs that changed locations
    p_id = x > 0.5    # Just a dummy condition
    x[p_id] = np.random.random(p_id.sum())    # Updated test x-coordinates
    x[p_id] = np.random.random(p_id.sum())    # Updated test y-coordinates
    dists_updated = np.sqrt((x - x[p_id,np.newaxis])**2 +
                            (y - y[p_id,np.newaxis])**2)
    # Update distance matrix
    dists[p_id]   = dists_updated
    dists[:,p_id] = dists_updated.T