I am trying to identify connected features on the globe (i.e., in space-time on a sphere). The cc3d package has gotten me 90% of the way but I am struggling to deal with the date border (i.e., the periodic boundary conditions in one of the three dimensions).
On a 2D map, the problem of my approach becomes apparent (note the connected buy differently labelled region around 0 longitude near the South Pole):
This appears because the data are defined on longitudes from 0 to 360 (while I show shown them from -180 to 180 here to make the problem more apparent).
My two typical solutions for these kinds of problems do not work:
MWE
A break-down of the problem in 2 dimensions should be the following:
import cc3d
import numpy as np
import matplotlib.pyplot as plt
arr = np.sin(np.random.RandomState(0).randn(100).reshape(10, 10)) > .3
labels = cc3d.connected_components(arr)
map = plt.pcolormesh(labels, vmin=.5)
map.cmap.set_under('none')
Here the yellow structure in the right top should be connected to the rest of the top structure and same for the two structures at the bottom. Please keep in mind that any helpful solution should also work for connected features in 3 dimensions.
Any help on how to approach this is appreciated.
Update April 2024: the option for periodic boundary conditions is now implemented in cc3d directly (see comment by SapphireSun below).
Okay I have a fix for the output. Its quite slow and requires to run connected_components
twice but works for me. I'm putting it here in case its helpful for anyone later.
I'll put the fix for the MWE here but it works equivalently for full 3D space-time data (independently for each day). The convention for the data I'm working with is that the longitude is the last dimension, so (time, latitude, longitude) but it is plotted on the x-axis, therefore I do a swapaxes
for all function calls in the MWE example.
Functions
def plot(arr):
map = plt.pcolormesh(arr, vmin=.5)
map.cmap.set_under('none')
for xx in range(len(arr)):
for yy in range(len(arr)):
plt.text(yy+.5, xx+.5, arr[xx, yy],
ha="center", va="center", color="k")
def cut_stitch(arr): # equivalent to switching the antimeridian [0, 360[ -> [-180, 180[
idx = len(arr) // 2
return np.concatenate([arr[idx:], arr[:idx]], axis=0)
def fix_date_border(arr, arr_r):
mask = arr == 0
mapping = {}
for key, value in zip(arr_r[~mask], arr[~mask]):
try:
mapping[key].add(value)
except KeyError:
mapping[key] = {value}
# if there is only one unique value per key its a one-to-one mapping and no problem
conflicts = [list(values) for values in mapping.values() if len(values) != 1]
print(mapping) # DEBUG
for conflict in conflicts:
idx = np.any([arr == cc for cc in conflict[1:]], axis=0)
arr[idx] = conflict[0] # reassign in original (i.e., first) array
return arr
Run component labelling a second time on the rearranged field, then rearrange again:
arr2 = cut_stitch(arr.swapaxes(0, 1)).swapaxes(0, 1)
labels2 = cut_stitch(cc3d.connected_components(arr2).swapaxes(0, 1)).swapaxes(0, 1)
plot(labels) # original
plot(labels2)
Fix the labels:
labels_fixed = fix_date_border(labels.swapaxes(0, 1), labels2.swapaxes(0, 1)).swapaxes(0, 1)
# {2: {1, 2}, 7: {8, 7}, 5: {4}, 3: {3}, 1: {1}, 4: {5}, 8: {7}, 6: {6}}
plot(labels_fixed, labels=True)
Note that the labels will no longer be strictly consecutive in the fixed array!