I want to find a way to "lasso around" a bunch of contiguous/touching values in a sparse table, and output a set of new tables. If any values are "touching", they should be part of a subarray together.
For example: if I have the following sparse table/array:
[[0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0]
[0 0 0 1 1 0 0 0 1 1 1 1 1 0 0 1 1 0 0]
[0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0]
[0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0]]
The algorithm should "find" subtables/subarrays. It would identify them like this:
[[0 0 0 1 1 0 0 0 2 2 2 2 0 0 0 0 0 0 0]
[0 0 0 1 1 0 0 0 2 2 2 2 2 0 0 3 3 0 0]
[0 0 0 0 0 0 0 0 2 2 2 0 0 0 3 3 3 3 0]
[0 0 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0]]
But the final output should be a series subarrays/subtables like this:
[[1 1]
[1 1]]
[[0 1 1 1 1 0]
[0 1 1 1 1 1]
[0 1 1 1 0 0]
[1 0 0 0 0 0]]
[[0 0 1 1 0]
[0 1 1 1 1]
[1 0 0 0 0]]
How can I do this in python? I've tried looking at sk-image and a few things seem to be similar to what I'm trying to do, but nothing I have seen seems to fit quite right.
EDIT: it looks like scipy.ndimage.label
is extremely close to what I want to do, but it will break the corner-case values into their own separate arrays. So it's not quite right. EDIT: ah ha, the structure
argument is what I am after. If I get time I will update my question with an answer.
A possible solution, which based on the following ideas:
First, measure.label
assigns a unique label to each connected component in the array based on an 8-connectivity criterion (connectivity=2
).
Second, measure.regionprops
retrieves properties of these labeled regions, such as their bounding boxes.
Then, the code iterates through each detected region, extracts the minimum and maximum row and column indices from the region's bounding box, and slices the original array a to obtain the corresponding subarray.
labels = measure.label(a, connectivity=2)
regions = measure.regionprops(labels)
list_suba = []
for region in regions:
min_row, min_col, max_row, max_col = region.bbox
subarray = a[min_row:max_row, min_col:max_col]
list_suba.append(subarray)
list_suba
Or, more concisely:
labels = measure.label(a, connectivity=2)
regions = measure.regionprops(labels)
[a[region.bbox[0]:region.bbox[2], region.bbox[1]:region.bbox[3]]
for region in regions]
Output:
[array([[1, 1],
[1, 1]]),
array([[0, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 0, 0],
[1, 0, 0, 0, 0, 0]]),
array([[0, 0, 1, 1, 0],
[0, 1, 1, 1, 1],
[1, 0, 0, 0, 0]])]