Search code examples
algorithmlisthaskelloptimizationnp-hard

What do you call the property of a list that describes the degree to which it contains duplicates?


I have a function that selects Cartesian products of lists such that the number of duplicate elements is highest:

import Data.List (nub)

f :: Eq a => [[a]] -> [[a]]
f xss = filter ((==) minLength . length . nub) cartProd
  where
    minLength = minimum (map (length . nub) cartProd)
    cartProd = sequence xss

For example:

*Main> f [[1,2,3],[4],[1,5]]
[[1,4,1]]

But:

*Main> sequence [[1,2,3],[4],[1,5]]
[[1,4,1],[1,4,5],[2,4,1],[2,4,5],[3,4,1],[3,4,5]]

Is there a name for the property that the result of my function f has?


Solution

  • I believe your function is computing a minimum set cover:

    Given a set of elements { 1 , 2 , . . . , n } (called the universe) and a collection S of sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe.

    In your case, n is length xss. There is one set in S for each distinct element x of concat xss, namely the set { i | x `elem` (xss !! i) } of all indices that x occurs in. The minimum set cover then tells you which x to choose from each list in xss (sometimes giving you multiple choices; any choice will produce the same final nubbed length).

    Here is a worked example for your [[1,2,3],[4],[1,5]]:

    The universe is {1,2,3}.

    There are five sets in the collection S; I'll name them S_1 through S_5:

    • S_1 = {1,3} because the first and third lists contain 1.
    • S_2 = {1} because the first list contains 2.
    • S_3 = {1} because the first list contains 3.
    • S_4 = {2} because the second list contains 4.
    • S_5 = {3} because the third list contains 5.

    A minimum set cover for this is {S_1, S_4}. Because this is a set cover, this means every list contains either 1 or 4. Because it is minimal, no other choice of sets produces a smaller final collection of values. So, we can choose either 1 or 4 from each list to produce a final answer. As it happens, no list contains both 1 and 4 so there is only one choice, namely [1,4,1].