Search code examples
regressionmathematical-optimization

Generate sets of sets of n-dimensional points


I am trying to take sets of points and split them up into smaller sets. The constraint is that each set has some minimum and some maximum for each of their dimensions. I want to generate all possible combinations of these sets (let's call this a set of sets.) When I am done, each point appears in exactly one set in each set o' sets.

As an example, let's say I just have data points that have two independent variables, i and j. They are:

(1,1) (1,2) (2,2) (3,1),(2,1),(2,3)

Any of these splits are fine:

(1,1)(1,2) and (2,2)(3,2)(2,1)(2,3)
First set has i < 2, second set has i >= 2.

(1,1)(3,1)(2,1) and (1,2)(2,2)(2,3)
First set has j < 2, second set has j >= 2.

(1,1)(1,2) and (2,2)(3,1)(2,1) and empty and (2,3)
First set has (i < 2, j < 3), second set has (i >= 2, j < 3)
Third set has (i < 2, j >= 3), fourth set has (i >= 2, j >= 3)

How can I generate the entire set of splits without manually iterating through every point (distinct numbers)! times?

This isn't homework, just a program I am trying to write as part of a data-fitter.


Solution

  • Assuming the intent is to have at most one dividing point in each dimension (thus partitioning the points into at most two sets with respect to that dimension), then:

    For each dimension, let V be the set of coordinates in that dimension. E.g., given the points (4, 10), (4, 20), (6, 10), (6, 18), (7, 3), then, for the first dimension, V is {4, 6, 7}. Iterate v through each value in the set.

    Nest those iterations, one for each dimension, so that, in the body of the innermost loop, you have a v for each dimension. We will number them, v0, v1, v2,...

    Each v forms a criterion: Either x < v or v <= x. For n dimensions, there are 2n combinations of these criteria. Each combination specifies a subset of the original points. E.g., one such subset is { p | x0 < v0 and v1 <= x1 and v2 <= x2 and... }, where the point p has coordinates (x0, x1, x2,...). So, iterate through the 2n potential subsets (some identical since they are empty), and collect them into a set. That is a partition of the original set of points.

    When you are done iterating the v’s through their values, you will have constructed each partition matching your criteria.