Search code examples
optimizationcombinationsmathematical-optimizationcombinatorics

How can I optimize finding the shortest description of a set


The goal is to find the optimal description of a subset of countries, using a combination of individual countries and groups of countries that are predefined.

Set of countries is comprised of all 2 letter ISO country codes, (US/DE/NL/etc)

Groups of countries are fixed, for example:

  • GROUP1: NL,BE,LU
  • GROUP2: NL,BE,LU,CZ,US,FI,NO,GI,FR,DK

I want to have a representation of this list of countries:

  • CZ,US,FI,NO,GI,FR,DK,DE

I would like to shorten this using the defined groups, to the representation with the least amount of elements. For this example that would be:

  • +GROUP2 +DE -GROUP1

Because if we expand this back out we get:

  • Add GROUP2: NL,BE,LU,CZ,US,FI,NO,GI,FR,DK
  • Add DE
  • Remove GROUP1: NL,BE,LU Which brings us back to CZ,US,FI,NO,GI,FR,DK,DE

Any pointers to libraries or examples are welcome in any programming language

I'm at a loss what a good approach would be. I've tried looking into existing algorithms, comparable problems, but I can't seem to wrap my head around the problem and find a good fit.

Just to illustrate a simple brute force solution that will obviously not scale:

let countries = ['C1', 'C2', 'C3', 'C4', 'C5', 'C6']
let groups = {
  G1: ['C1', 'C2', 'C3', 'C4', 'C5'],
  G2: ['C1', 'C4'],
}

let output = getBest(['C2', 'C3', 'C5', 'C6'])
// output == ["+C6", "+G1", "-G2"]

function getBest(input) {
  const ids = countries.concat(Object.keys(groups))

  let best = input
  for (const t of combinations(ids)) {
    if (expand(t, groups).sort().toString() == input.toString()) {
      if (t.length < best.length) best = [...t]
    }
  }

  return best
}

// Expands short form to long form
function expand(s, groups) {
  return Array.from(
    s.sort().reduce((acc, curr) => {
      let symbol = curr[0]
      let id = curr.slice(1)
      if (groups[id]) {
        curr = groups[id]
      } else {
        curr = [id]
      }

      if (symbol == '+') {
        return new Set([...acc, ...curr])
      } else {
        return new Set([...acc].filter((a) => !curr.includes(a)))
      }
    }, new Set())
  )
}

// Yields all possible short form options
function* combinations(array) {
  array = ['+', '-'].reduce((acc, curr) => {
    return acc.concat(array.map((s) => curr + s))
  }, [])

  for (const subset of subsets(array)) {
    yield subset
  }
}

// Creates powerset of array
function* subsets(array, offset = 0) {
  while (offset < array.length) {
    let first = array[offset++]
    for (let subset of subsets(array, offset)) {
      subset.push(first)
      yield subset
    }
  }
  yield []
}

Solution

  • To me, the problem does not sound like there exists a classical model for it with a well-known polynomial-time algorithm.

    A reasonable approach looks as follows.

    Consider formulas in a formal language, like (G1 subtract G2) unite (G3 intersect G4), where Gi are predefined groups (and perhaps individual countries, but that will slow the solution down a lot).

    Each formula's score is its length plus the size of difference with the desired answer (so that we add or subtract individual elements as the last step of the formula).

    Now, for formulas of lengths 0, 1, 2, ..., (iterative deepening), recursively generate all possible formulas of such length and consider their score. Stop when the length reaches the score of the best answer so far.

    There's some room to optimize (for example, prune clearly stupid branches like Gi symdiff Gi, or perhaps memoize the shortest formula for each set we can obtain), but the solution is nevertheless exponential in the number of items and groups.