Search code examples
pythongisscikit-imagenetwork-analysis

MCP Geometrics for calculating marketsheds


I am trying to calculate marketsheds using the skimage.MCP_geometric find_costs function. It has been working wonderfully to calculate least-cost routes, but rather than finding the travel cost to the nearest source, I want to calculate the index of the nearest source.

Sample Code

import numpy as np
import skimage.graph as graph
import copy

img = np.array([[1,1,2,2],[2,1,1,3],[3,2,1,2],[2,2,2,1]])
mcp = graph.MCP_Geometric(img)

destinations = [[0,0],[3,3]]
costs, traceback = mcp.find_costs(destinations)
print(costs)

[[0.         1.         2.5        4.5       ]
 [1.5        1.41421356 2.41421356 4.        ]
 [4.         2.91421356 1.41421356 1.5       ]
 [5.5        3.5        1.5        0.        ]]

This works as expected, and creates a nice travel cost raster. However, I want (for each cell) to know which of the destinations is the closest. The best solution I have found is to run each of the destinations separately, then combine them through min calculations. It works, but is slow, and has not been working at scale.

all_c = []
for dest in destinations:
    costs, traceback = mcp.find_costs([dest])
    all_c.append(copy.deepcopy(costs))

res = np.dstack(all_c)
res_min = np.amin(res, axis=2)
output = np.zeros([res_min.shape[0], res_min.shape[1]])
for idx in range(0, res.shape[2]):
    cur_data = res[:,:,idx]
    cur_val = (cur_data == res_min).astype(np.byte) * idx
    output = output + cur_val
output = output.astype(np.byte)

print(output)
array([[0, 0, 0, 0],
       [0, 0, 1, 1],
       [0, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

I have been looking into overloading the functions of MCP_Geometric and MCP_Flexible, but I cannot find anything providing information on the index of the destination.

Hope that provides enough information to replicate and understand what I want to do, thanks!


Solution

  • Ok, this is a bit of a ride, but it was fun to figure out. I'm unclear just how fast it'll be but I think it should be pretty fast in the case of many destinations and comfortably-in-RAM images.

    The key is the traceback return value, which kinda-sorta tells you the neighbor index to get to the nearest destination. So with a bit of pathfinding you should be able to find that destination. Can that be fast? It turns out it can, with a bit of NumPy index wrangling, scipy.sparse matrices, and connected_components from scipy.sparse.csgraph!

    Let's start with your same costs array and both destinations:

    import numpy as np
    
    image = np.array(
        [[1, 1, 2, 2],
         [2, 1, 1, 3],
         [3, 2, 1, 2],
         [2, 2, 2, 1]]
    )
    destinations = [[0, 0], [3, 3]]
    

    We then make the graph, and get the costs and the traceback:

    from skimage import graph
    
    mcp = graph.MCP_Geometric(image)
    costs, traceback = mcp.find_costs(destinations)
    print(traceback)
    

    gives:

    [[-1  4  4  4]
     [ 6  7  7  1]
     [ 6  6  0  1]
     [ 3  3  3 -1]]
    

    Now, I had to look up the documentation for what traceback is:

    Same shape as the costs array; this array contains the offset to any given index from its predecessor index. The offset indices index into the offsets attribute, which is a array of n-d offsets. In the 2-d case, if offsets[traceback[x, y]] is (-1, -1), that means that the predecessor of [x, y] in the minimum cost path to some start position is [x+1, y+1]. Note that if the offset_index is -1, then the given index was not considered.

    For some reason, my mcp object didn't have an offsets attribute — possibly a Cython inheritance bug? Dunno, will investigate later — but searching the source code shows me that offsets is defined with the skimage.graph._mcp.make_offsets function. So I did a bad thing and imported from that private module, so I could claim what was rightfully mine — the offsets list, which translates from numbers in traceback to offsets in the image coordinates:

    from skimage.graph import _mcp
    
    offsets = _mcp.make_offsets(2, True)
    print(offsets)
    

    which gives:

    [array([-1, -1]),
     array([-1,  0]),
     array([-1,  1]),
     array([ 0, -1]),
     array([0, 1]),
     array([ 1, -1]),
     array([1, 0]),
     array([1, 1])]
    

    Now, there's one last thing to do with the offsets: you'll note that destinations are marked in the traceback with "-1", which doesn't correspond to the last element of the offsets array. So we append np.array([0, 0]), and then every value in traceback corresponds to a real offset. In the case of destinations, you get a self-edge, but that's fine.

    offsets.append(np.array([0, 0]))
    offsets_arr = np.array(offsets)  # shape (9, 2)
    

    Now, we can build a graph from offsets, pixel coordinates, and pixel ids. First, we use np.indices to get an index for every pixel in the image:

    indices = np.indices(traceback.shape)
    print(indices.shape)
    

    gives:

    (2, 4, 4)
    

    To get an array that has, for each pixel, the offset to its neighbor, we use fancy array indexing:

    offset_to_neighbor = offsets_arr[traceback]
    print(offset_to_neighbor.shape)
    

    which gives:

    (4, 4, 2)
    

    The axes are different between the traceback and the numpy indices, but nothing a little transposition won't fix:

    neighbor_index = indices - offset_to_neighbor.transpose((2, 0, 1))
    

    Finally, we want to deal with integer pixel ids in order to create a graph of all the pixels, rather than coordinates. For this, we use np.ravel_multi_index.

    ids = np.arange(traceback.size).reshape(image.shape)
    neighbor_ids = np.ravel_multi_index(
        tuple(neighbor_index), traceback.shape
    )
    

    This gives me a unique ID for each pixel, and then a unique "step towards the destination" for each pixel:

    print(ids)
    print(neighbor_ids)
    
    [[ 0  1  2  3]
     [ 4  5  6  7]
     [ 8  9 10 11]
     [12 13 14 15]]
    [[ 0  0  1  2]
     [ 0  0  1 11]
     [ 4  5 15 15]
     [13 14 15 15]]
    

    Then we can turn this into a graph using SciPy sparse matrices. We don't care about weights for this graph so we just use the value 1 for the edges.

    from scipy import sparse
    
    g = sparse.coo_matrix((
        np.ones(traceback.size),
        (ids.flat, neighbor_ids.flat),
        shape=(ids.size, ids.size),
    )).tocsr()
    

    (This uses the (value, (row, column)) or (data, (i, j)) input format for sparse COOrdinate matrices.)

    Finally, we use connected components to get the graphs — the groups of pixels that are nearest to each destination. The function returns the number of components and the mapping of "pixel id" to component:

    n, components = sparse.csgraph.connected_components(g)
    basins = components.reshape(image.shape)
    print(basins)
    
    [[0 0 0 0]
     [0 0 0 1]
     [0 0 1 1]
     [1 1 1 1]]
    

    (Note that this result is slightly different from yours because the cost is identical to destination 0 and 1 for the pixels in question, so it's arbitrary which to label.)

    print(costs)
    
    [[0.         1.         2.5        4.5       ]
     [1.5        1.41421356 2.41421356 4.        ]
     [4.         2.91421356 1.41421356 1.5       ]
     [5.5        3.5        1.5        0.        ]]
    

    Hope this helps!