I am trying to calculate marketsheds using the skimage.MCP_geometric find_costs function. It has been working wonderfully to calculate least-cost routes, but rather than finding the travel cost to the nearest source, I want to calculate the index of the nearest source.
Sample Code
import numpy as np
import skimage.graph as graph
import copy
img = np.array([[1,1,2,2],[2,1,1,3],[3,2,1,2],[2,2,2,1]])
mcp = graph.MCP_Geometric(img)
destinations = [[0,0],[3,3]]
costs, traceback = mcp.find_costs(destinations)
print(costs)
[[0. 1. 2.5 4.5 ]
[1.5 1.41421356 2.41421356 4. ]
[4. 2.91421356 1.41421356 1.5 ]
[5.5 3.5 1.5 0. ]]
This works as expected, and creates a nice travel cost raster. However, I want (for each cell) to know which of the destinations is the closest. The best solution I have found is to run each of the destinations separately, then combine them through min calculations. It works, but is slow, and has not been working at scale.
all_c = []
for dest in destinations:
costs, traceback = mcp.find_costs([dest])
all_c.append(copy.deepcopy(costs))
res = np.dstack(all_c)
res_min = np.amin(res, axis=2)
output = np.zeros([res_min.shape[0], res_min.shape[1]])
for idx in range(0, res.shape[2]):
cur_data = res[:,:,idx]
cur_val = (cur_data == res_min).astype(np.byte) * idx
output = output + cur_val
output = output.astype(np.byte)
print(output)
array([[0, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]], dtype=int8)
I have been looking into overloading the functions of MCP_Geometric and MCP_Flexible, but I cannot find anything providing information on the index of the destination.
Hope that provides enough information to replicate and understand what I want to do, thanks!
Ok, this is a bit of a ride, but it was fun to figure out. I'm unclear just how fast it'll be but I think it should be pretty fast in the case of many destinations and comfortably-in-RAM images.
The key is the traceback
return value, which kinda-sorta tells you the neighbor index to get to the nearest destination. So with a bit of pathfinding you should be able to find that destination. Can that be fast? It turns out it can, with a bit of NumPy index wrangling, scipy.sparse
matrices, and connected_components
from scipy.sparse.csgraph
!
Let's start with your same costs array and both destinations:
import numpy as np
image = np.array(
[[1, 1, 2, 2],
[2, 1, 1, 3],
[3, 2, 1, 2],
[2, 2, 2, 1]]
)
destinations = [[0, 0], [3, 3]]
We then make the graph, and get the costs and the traceback:
from skimage import graph
mcp = graph.MCP_Geometric(image)
costs, traceback = mcp.find_costs(destinations)
print(traceback)
gives:
[[-1 4 4 4]
[ 6 7 7 1]
[ 6 6 0 1]
[ 3 3 3 -1]]
Now, I had to look up the documentation for what traceback is:
Same shape as the
costs
array; this array contains the offset to any given index from its predecessor index. The offset indices index into theoffsets
attribute, which is a array of n-d offsets. In the 2-d case, if offsets[traceback[x, y]] is (-1, -1), that means that the predecessor of [x, y] in the minimum cost path to some start position is [x+1, y+1]. Note that if the offset_index is -1, then the given index was not considered.
For some reason, my mcp
object didn't have an offsets attribute — possibly a Cython inheritance bug? Dunno, will investigate later — but searching the source code shows me that offsets is defined with the skimage.graph._mcp.make_offsets
function. So I did a bad thing and imported from that private module, so I could claim what was rightfully mine — the offsets list, which translates from numbers in traceback
to offsets in the image coordinates:
from skimage.graph import _mcp
offsets = _mcp.make_offsets(2, True)
print(offsets)
which gives:
[array([-1, -1]),
array([-1, 0]),
array([-1, 1]),
array([ 0, -1]),
array([0, 1]),
array([ 1, -1]),
array([1, 0]),
array([1, 1])]
Now, there's one last thing to do with the offsets: you'll note that destinations are marked in the traceback with "-1", which doesn't correspond to the last element of the offsets array. So we append np.array([0, 0])
, and then every value in traceback
corresponds to a real offset. In the case of destinations, you get a self-edge, but that's fine.
offsets.append(np.array([0, 0]))
offsets_arr = np.array(offsets) # shape (9, 2)
Now, we can build a graph from offsets, pixel coordinates, and pixel ids. First, we use np.indices
to get an index for every pixel in the image:
indices = np.indices(traceback.shape)
print(indices.shape)
gives:
(2, 4, 4)
To get an array that has, for each pixel, the offset to its neighbor, we use fancy array indexing:
offset_to_neighbor = offsets_arr[traceback]
print(offset_to_neighbor.shape)
which gives:
(4, 4, 2)
The axes are different between the traceback and the numpy indices, but nothing a little transposition won't fix:
neighbor_index = indices - offset_to_neighbor.transpose((2, 0, 1))
Finally, we want to deal with integer pixel ids in order to create a graph of all the pixels, rather than coordinates. For this, we use np.ravel_multi_index
.
ids = np.arange(traceback.size).reshape(image.shape)
neighbor_ids = np.ravel_multi_index(
tuple(neighbor_index), traceback.shape
)
This gives me a unique ID for each pixel, and then a unique "step towards the destination" for each pixel:
print(ids)
print(neighbor_ids)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 0 1 2]
[ 0 0 1 11]
[ 4 5 15 15]
[13 14 15 15]]
Then we can turn this into a graph using SciPy sparse matrices. We don't care about weights for this graph so we just use the value 1 for the edges.
from scipy import sparse
g = sparse.coo_matrix((
np.ones(traceback.size),
(ids.flat, neighbor_ids.flat),
shape=(ids.size, ids.size),
)).tocsr()
(This uses the (value, (row, column)) or (data, (i, j)) input format for sparse COOrdinate matrices.)
Finally, we use connected components to get the graphs — the groups of pixels that are nearest to each destination. The function returns the number of components and the mapping of "pixel id" to component:
n, components = sparse.csgraph.connected_components(g)
basins = components.reshape(image.shape)
print(basins)
[[0 0 0 0]
[0 0 0 1]
[0 0 1 1]
[1 1 1 1]]
(Note that this result is slightly different from yours because the cost is identical to destination 0 and 1 for the pixels in question, so it's arbitrary which to label.)
print(costs)
[[0. 1. 2.5 4.5 ]
[1.5 1.41421356 2.41421356 4. ]
[4. 2.91421356 1.41421356 1.5 ]
[5.5 3.5 1.5 0. ]]
Hope this helps!