python performance scikit-image area overlapping

Python (scikit-image): Most performant way to determine most common color in a masked area

I'm modding a strategy game and have two RGB images. One defines the province areas on the map, each province being a unique color. The other image defines the terrain of the map.

I have a python script that compares these two images to determine the terrain type of each province, by checking the most common terrain-color in each province.

For example, I might check province of color (0, 0, 255) and find that its area is entirely full of grassland pixels on the terrain map.

The particular way I'm currently doing this is like this:

from skimage import io
import numpy

# Takes an RGB map of shape x, y, and colour channels, and converts it to a 2D array of shape x, y, where each item in the 2D array is an int referring to the colour
# it represents, as indexed in the returned unique colours list.
def get_inverse_map(map, to_tuples = True):
    original_map_shape = (map.shape[0], map.shape[1])
    flattened_map = map.reshape(-1, map.shape[2])
    unique_map_cols, first_occurances, map_inverses, index_counts = numpy.unique(flattened_map, return_index = True, return_inverse = True, return_counts = True, axis = 0)
    unflattened_inverse_map = map_inverses.reshape(original_map_shape)

    if to_tuples:
        unique_map_tuples = [tuple(col) for col in unique_map_cols]
        return unflattened_inverse_map, unique_map_tuples, index_counts
    else:
        return unflattened_inverse_map, unique_map_cols, index_counts


province_map = io.imread(self.province_map_dir)
terrain_map = io.imread(self.terrain_map_dir)

inverse_province_map, unique_province_cols = get_inverse_map(province_map)[ : 2]
inverse_terrain_map, unique_terrain_cols = get_inverse_map(terrain_map)[ : 2]

for p in unique_province_cols:
    province_mask = inverse_province_map == p
    province_terrain_pixels = inverse_terrain_map[province_mask]

    occurances = numpy.bincount(province_terrain_pixels)
    mode_index = numpy.argmax(occurances) # This is the most common index of the colours in the terrain map that occur within province_mask

I convert the images from RGB arrays (shape: x, x, 3) to index maps (shape: x, x, 1) for quicker comparisons.
I get a mask of the province I want to evaluate.
I get the pixels in the terrain map that fall within the province map.
I use bincount and argmax to find the most common one.

Because I can't always guarantee that only one terrain color will be in a province's bounds (sometimes I paint over the edges) I need to find the most common color instead of just checking one pixel.

I do this on very large images, so despite my use of index maps to speed things up, it still takes time. The majority of time seems to be lost on get_inverse_map, presumably numpy.unique though I'm not sure.

Is there a quicker way to go about this?

Solution

Problem with your approach is iterating through whole map for each province. My solution does that only once.

Assuming you know:

number of provinces - N (if not assume some max number of provinces)
number of possible terrain colors - C

Do:

Create 2d numpy array province_terrain_stats to cumulate your statistics, size of N x C, each column representing
iterate through whole map, at each pixel read its province idx and color idx. Add 1 to the statistics at correct place in province_terrain_stats. You may need to create some color2idx helper map. Important: Use numba package to code this loop over whole map. It is a must if you want good performance using this solution.
for each province in province_terrain_stats find most common terrain using np.argmax