Search code examples
algorithmsortingpseudocode

Best way to reduce a matcher


I have a matcher like so:

type matcher struct {
    all  []int
    none []int
    any  [][]int
}

For an array of integers to match, it must contain all of the integers in all, none of the integers in none, and must contain at least one integer from each array of integers in any. I have a factory to make the construction of such a matcher easy, but once I have the factory fields I need to "flatten" the structure to make sure none of the terms are contradictory. Right now, what I have is (pseudocode):

all = [0, 1, 2, 3] // means all of these numbers must be present for the matcher to match
any = [[4, 5], [6, 7, 8]] // means one number from each of these groups must be present
none = [9,10] // means none of these may be present
sort(all)
sort(none)
if len(intersections(all, none)) != 0 {
    panic("All and none are contradictory!")
}

for gIndex, group in any {
    sort(group)
    for index, integer in group {
        // If all contains the integer...
        if binarySearch(all, integer) {
            swap(group[len(group)-1], group[index]) // swap the last item with the integer...
            group = group[:len(all)-1] // and truncate the group by one; most efficient way to delete an item.
        }
        // If none contains the integer...
        if binarySearch(none, integer) {
            panic("item in any is also found in none; possible contradiction")
        }
    }
    if len(group) < 1 {
        // delete group from any
    } else if len(group) == 1 {
        // put group[0] (only integer in it) into all, since it must be met
    } else {
        sort(group) // Re-sort the group
        any[gIndex] = group // Update the group
    }
}
removeDuplicates(all)
removeDuplicates(none)
sort2D(any) // Sort by their greatest member
removeDuplicates2D(any) // Go through and remove any duplicate arrays.

all, any, and none should now be reduced to their simplest form, so that checking if an integer array matches should be more efficient and without contradictions.

Is there a better way to do this, and am I missing anything here?


Solution

  • There are at least two flaws in your solution.

    1. You are removing integers in all from the any groups. In fact you have to remove the whole group instead of removing an item if there is an intersection with all. any groups containing items from all are just redundant.

      Having overlaps with the none group is not necessarily a problem. These items could in fact just be removed from the any groups. But then you might want to throw an error when this leads to an empty any group. A matcher with an empty any group won't match anything. Anyway, don't just quietly remove empty any groups!

    2. Something similar applies for any groups subsuming other any groups. Consider this choice of any:

      any = [[4, 5], [4, 5, 6]]
      

      Let's assume that all and none are empty. Then this would remain unchanged by your algorithm. However, the second group is clearly redundant. Any array containing at least one element of the first group will also match the second. This means that at least you have to remove all any arrays containing a superset of others.

    I'm confident that this leads to a canonical form that you can use to compare matchers. However, I may be wrong.