For a monte carlo simulation, I have 500 instances of Person, each which have a location (cartesian coordinates) as attribute. Throughout the simulation, I need to access the distances between pairs of two persons multiple times. I already have defined a function to compute the (cartesian) distance between pairs of two persons (for simplicity lets call this function 'distance(loc1,loc2)'). I am interested in making the script computationally more efficient. A first step I took is creating a symmetric matrix which stores the distances rather than computing the distance each time it is needed. For now I decided to make it a nested numpy array; I could change it to a nested list or something else if it makes things easier. (I made the matrix using an adapted version of this: Python: Generating a symmetric array with list comprehension) The matrix of distances looks something like this:
np.array([[0, 6.44177991, 2.74762143, 3.47162016, 2.0645646 ],
[6.44177991, 0, 1.59860905, 8.99027864, 2.58449879],
[2.74762143 , 1.59860905, 0, 2.06833575, 8.53594684],
[3.47162016, 8.99027864 , 2.06833575, 0, 6.76594943],
[2.0645646, 2.58449879, 8.53594684, 6.76594943, 0]])
During the simulation, the locations of people (ocasionally) change. When this happens, I would need to 'update' the matrix. I currently approach this using a for loop (see below), but I was wondering if there was a more efficient approach to replace the values.
#note: population is a list containing the 500 Person entities
#e.g. if person5's location changes:
p_id = 5
for i in range(len(population)):
if i!=p_id:
new_distance=distance(population[i].location, population[p_id].location)
distance_matrix[p_id][i] = new_distance
distance_matrix[i][p_id] = new_distance
Although you didn't ask for it, you could find the distance between points in a very simple way using Numpy's broadcasting. This is implemented via the np.newaxis
function, which simply add another (empty) dimension to the array, so that the arithmetic operation is performed on the broadcasted array (which saves memory). In the code below there is no need for loops since we simply index only the rows (and columns) of the distance matrix that are changed (corresponding to the modified persons locations) in every step.
I hope it is indeed more efficient than your initial approach (haven't timed it), although perhaps further optimization could be done (last two lines bug me a little, not gonna lie).
Note that in the test code below I assumed your Person
coordinates are displayed in the random arrays x
and y
, while p_id
indicates the indices of the persons that have changed locations.
import numpy as np
np.random.seed(1) # Fix seed for reproducibility
x = np.random.random(10) # Test x-coordinates array
y = np.random.random(10) # Test y-coordinates array
# Calculate distances
dists = np.sqrt((x - x[:,np.newaxis])**2 + (y - y[:,np.newaxis])**2)
# Test boolean array of persons IDs that changed locations
p_id = x > 0.5 # Just a dummy condition
x[p_id] = np.random.random(p_id.sum()) # Updated test x-coordinates
x[p_id] = np.random.random(p_id.sum()) # Updated test y-coordinates
dists_updated = np.sqrt((x - x[p_id,np.newaxis])**2 +
(y - y[p_id,np.newaxis])**2)
# Update distance matrix
dists[p_id] = dists_updated
dists[:,p_id] = dists_updated.T