python gis scikit-image network-analysis

MCP Geometrics for calculating marketsheds

I am trying to calculate marketsheds using the skimage.MCP_geometric find_costs function. It has been working wonderfully to calculate least-cost routes, but rather than finding the travel cost to the nearest source, I want to calculate the index of the nearest source.

Sample Code

import numpy as np
import skimage.graph as graph
import copy

img = np.array([[1,1,2,2],[2,1,1,3],[3,2,1,2],[2,2,2,1]])
mcp = graph.MCP_Geometric(img)

destinations = [[0,0],[3,3]]
costs, traceback = mcp.find_costs(destinations)
print(costs)

[[0.         1.         2.5        4.5       ]
 [1.5        1.41421356 2.41421356 4.        ]
 [4.         2.91421356 1.41421356 1.5       ]
 [5.5        3.5        1.5        0.        ]]

This works as expected, and creates a nice travel cost raster. However, I want (for each cell) to know which of the destinations is the closest. The best solution I have found is to run each of the destinations separately, then combine them through min calculations. It works, but is slow, and has not been working at scale.

all_c = []
for dest in destinations:
    costs, traceback = mcp.find_costs([dest])
    all_c.append(copy.deepcopy(costs))

res = np.dstack(all_c)
res_min = np.amin(res, axis=2)
output = np.zeros([res_min.shape[0], res_min.shape[1]])
for idx in range(0, res.shape[2]):
    cur_data = res[:,:,idx]
    cur_val = (cur_data == res_min).astype(np.byte) * idx
    output = output + cur_val
output = output.astype(np.byte)

print(output)
array([[0, 0, 0, 0],
       [0, 0, 1, 1],
       [0, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

I have been looking into overloading the functions of MCP_Geometric and MCP_Flexible, but I cannot find anything providing information on the index of the destination.

Hope that provides enough information to replicate and understand what I want to do, thanks!

Solution

Ok, this is a bit of a ride, but it was fun to figure out. I'm unclear just how fast it'll be but I think it should be pretty fast in the case of many destinations and comfortably-in-RAM images.

The key is the traceback return value, which kinda-sorta tells you the neighbor index to get to the nearest destination. So with a bit of pathfinding you should be able to find that destination. Can that be fast? It turns out it can, with a bit of NumPy index wrangling, scipy.sparse matrices, and connected_components from scipy.sparse.csgraph!

Let's start with your same costs array and both destinations:

import numpy as np

image = np.array(
    [[1, 1, 2, 2],
     [2, 1, 1, 3],
     [3, 2, 1, 2],
     [2, 2, 2, 1]]
)
destinations = [[0, 0], [3, 3]]

We then make the graph, and get the costs and the traceback:

from skimage import graph

mcp = graph.MCP_Geometric(image)
costs, traceback = mcp.find_costs(destinations)
print(traceback)

gives:

[[-1  4  4  4]
 [ 6  7  7  1]
 [ 6  6  0  1]
 [ 3  3  3 -1]]

Now, I had to look up the documentation for what traceback is:

Same shape as the costs array; this array contains the offset to any given index from its predecessor index. The offset indices index into the offsets attribute, which is a array of n-d offsets. In the 2-d case, if offsets[traceback[x, y]] is (-1, -1), that means that the predecessor of [x, y] in the minimum cost path to some start position is [x+1, y+1]. Note that if the offset_index is -1, then the given index was not considered.

For some reason, my mcp object didn't have an offsets attribute — possibly a Cython inheritance bug? Dunno, will investigate later — but searching the source code shows me that offsets is defined with the skimage.graph._mcp.make_offsets function. So I did a bad thing and imported from that private module, so I could claim what was rightfully mine — the offsets list, which translates from numbers in traceback to offsets in the image coordinates:

from skimage.graph import _mcp

offsets = _mcp.make_offsets(2, True)
print(offsets)

which gives:

[array([-1, -1]),
 array([-1,  0]),
 array([-1,  1]),
 array([ 0, -1]),
 array([0, 1]),
 array([ 1, -1]),
 array([1, 0]),
 array([1, 1])]

Now, there's one last thing to do with the offsets: you'll note that destinations are marked in the traceback with "-1", which doesn't correspond to the last element of the offsets array. So we append np.array([0, 0]), and then every value in traceback corresponds to a real offset. In the case of destinations, you get a self-edge, but that's fine.

offsets.append(np.array([0, 0]))
offsets_arr = np.array(offsets)  # shape (9, 2)

Now, we can build a graph from offsets, pixel coordinates, and pixel ids. First, we use np.indices to get an index for every pixel in the image:

indices = np.indices(traceback.shape)
print(indices.shape)

gives:

(2, 4, 4)

To get an array that has, for each pixel, the offset to its neighbor, we use fancy array indexing:

offset_to_neighbor = offsets_arr[traceback]
print(offset_to_neighbor.shape)

which gives:

(4, 4, 2)

The axes are different between the traceback and the numpy indices, but nothing a little transposition won't fix:

neighbor_index = indices - offset_to_neighbor.transpose((2, 0, 1))

Finally, we want to deal with integer pixel ids in order to create a graph of all the pixels, rather than coordinates. For this, we use np.ravel_multi_index.

ids = np.arange(traceback.size).reshape(image.shape)
neighbor_ids = np.ravel_multi_index(
    tuple(neighbor_index), traceback.shape
)

This gives me a unique ID for each pixel, and then a unique "step towards the destination" for each pixel:

print(ids)
print(neighbor_ids)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
[[ 0  0  1  2]
 [ 0  0  1 11]
 [ 4  5 15 15]
 [13 14 15 15]]

Then we can turn this into a graph using SciPy sparse matrices. We don't care about weights for this graph so we just use the value 1 for the edges.

from scipy import sparse

g = sparse.coo_matrix((
    np.ones(traceback.size),
    (ids.flat, neighbor_ids.flat),
    shape=(ids.size, ids.size),
)).tocsr()

(This uses the (value, (row, column)) or (data, (i, j)) input format for sparse COOrdinate matrices.)

Finally, we use connected components to get the graphs — the groups of pixels that are nearest to each destination. The function returns the number of components and the mapping of "pixel id" to component:

n, components = sparse.csgraph.connected_components(g)
basins = components.reshape(image.shape)
print(basins)

[[0 0 0 0]
 [0 0 0 1]
 [0 0 1 1]
 [1 1 1 1]]

(Note that this result is slightly different from yours because the cost is identical to destination 0 and 1 for the pixels in question, so it's arbitrary which to label.)

print(costs)

[[0.         1.         2.5        4.5       ]
 [1.5        1.41421356 2.41421356 4.        ]
 [4.         2.91421356 1.41421356 1.5       ]
 [5.5        3.5        1.5        0.        ]]

Hope this helps!