Search code examples
recursionclojurenpset-cover

Minimal `set cover` solution in Clojure


I've been trying ways to distill (a lot of) database index suggestions into a set of indexes that are applicable to most databases. To do that it turns out I need to solve a pretty basic but NP complete set theory problem: the minimum set cover problem.

This means that given a set of sets, select a subset of sets s that covers a certain domain u, but if u isn't given make it the union of s. The most optimal subset of sets is the one reaching a certain minimum, usually the minimal amount of sets, but it could also be a minimum in total weight if the sets are weighted.

(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})

I implemented a naive version of this by using clojure.math.combinatorics, relying on it returning subsets in order of increasing amounts of sets.

(defn set-cover
  ([s]
   (set-cover s (apply set/union s)))
  ([s u]
   (->> s
        (combo/subsets)
        (filter (fn [s] (= u (apply set/union s))))
        first)))

However this is very slow on larger s, because of the NP nature and the recurring unions (even optimized ones). For my use-case a version supporting weighted sets would also be preferable.

Looking into optimized versions most trails ended up in thesis-land, which I'm regrettably not smart enough for. I found this small python implementation on SO

def setCover(setList,target=None):
    if not setList: return None
    if target  is None: target  = set.union(*setList)
    bestCover = []
    for i,values in enumerate(setList):
        remaining = target - values
        if remaining == target: continue
        if not remaining: return [values]
        subCover = setCover(setList[i+1:],remaining)
        if not subCover: continue
        if not bestCover or len(subCover)<len(bestCover)-1:
            bestCover = [values] + subCover
    return bestCover

It ticks many boxes:

  • work recursively
  • compares partial results as optimization
  • seems suitable for different minimum definitions: count or weight
  • has additional optimizations I can grok
    • which can be done outside of the basic algorithm
      • sorting input sets on high minimum score (size, weight)
      • identifying unique singleton sets in u not found in other sets

I have been trying to translate this into Clojure as a loop-recur function, but couldn't get the basic version of it to work, since there are niggling paradigm gaps between the two languages.

Does anyone have suggestions how I could go about solving this problem in Clojure, either by tips how to convert the python algorithm successfully, or which other Clojure (or even Java) libraries I could use and how ?


Solution

  • Here's a Clojure version of the greedy set cover algorithm i.e. selects a set which covers the most uncovered elements at each iteration. Rather than use loop/recur to build the complete result, it lazily returns each result element using lazy-seq:

    (defn greedy-set-cover
      ([xs]
       (greedy-set-cover xs (apply set/union xs)))
      ([xs us]
       (lazy-seq
         (when (seq us)
           (let [x (apply max-key #(count (set/intersection us %)) xs)
                 xs' (disj xs x)
                 us' (set/difference us x)]
             (cons x (greedy-set-cover xs' us')))))))
    
    (def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
    
    (greedy-set-cover s)   ;; = (#{7 1 4} #{6 2 5} #{6 8} #{3})