Search code examples
algorithmcombinationspowerset

Grouping items into subsets (power set)


Let's say I have the following:

john: [a, b, c, d]
bob:  [a, c, d, e]
mary: [a, b, e, f]

Or reformatted slightly so you can easily see the groupings:

john: [a, b, c, d]
bob:  [a,    c, d, e]
mary: [a, b,       e, f]

What's the most common or most efficient algorithm to generate the following grouping?

[john, bob, mary]: [a]
[john, mary]:      [b]
[john, bob]:       [c,d]
[bob, mary]:       [e]
[mary]:            [f]
[john]:            []
[bob]:             []

After quickly googling, it seems like the above keys represent the "power set". So I was planning on the following impl:

1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}} // j = john, b = bob, m = mary

2) Generate set of all letters: {a, b, c, d, e, f}

3) Iterate over subsets, and for each letter, see if letter exists in all elements of the subset

So...

subset = {j, b, m}

letter = a
    j contains a? true
    b contains a? true
    m contains a? true
        * add a to subset {j, b, m}

letter = b
    j contains b? true
    b contains b? false, continue

letter = c
    j contains c? true
    b contains c? true
    m contains c? false, continue
.....

subset = {j, m}
.....

Is there a better solution?

EDIT: The above algorithm is flawed. For instance, {j, m} would also contain "a", which I don't want. I suppose I can simply modify it so that in each iteration, I also check to see if this letter is "NOT IN" the elements not in this set. So in this case, I would also check:

if b does not contain a

Solution

  • Step 3 (Iterate over subsets) would be inefficient, as it does either "j contain a" or "a not in j" for each element in your power set.

    Here is what I would suggest:

    1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}}. You don't need this step is you don't care about empty mappings in the final output.

    2) Iterate over all the elements in your original data structure and construct following:

    [a] : [j, b, m]
    [b] : [j, m]
    [c] : [j, b]
    [d] : [j, b]
    [e] : [b, m]
    [f] : [m]
    

    3) Reverse the above structure and aggregate (using maps of [j, b,...] to list of [a,b...] should do the trick) to get this:

    [j, b, m] : [a]
    [j, m] : [b]
    [j, b] : [c, d]
    [b, m] : [e]
    [m] : [f]
    

    4) Compare 3 with 1 to fill in the remaining empty mappings.

    Edit: Hers is the complete code in Java

        // The original data structure. mapping from "john" to [a, b, c..] 
        HashMap<String, HashSet<String>> originalMap = new HashMap<String, HashSet<String>>();
    
        // The final data structure. mapping from power set to [a, b...]
        HashMap<HashSet<String>, HashSet<String>> finalMap = new HashMap<HashSet<String>, HashSet<String>>();
    
        // Intermediate data structure. Used to hold [a] to [j,b...] mapping
        TreeMap<String, HashSet<String>> tmpMap = new TreeMap<String, HashSet<String>>();
    
        // Populate the original dataStructure
        originalMap.put("john", new HashSet<String>(Arrays.asList("a", "b", "c", "d")));
        originalMap.put("bob", new HashSet<String>(Arrays.asList("a", "c", "d", "e")));
        originalMap.put("mary", new HashSet<String>(Arrays.asList("a", "b", "e", "f")));
    
        // Hardcoding the powerset below. You can generate the power set using the algorithm used in googls guava library.
        // powerSet function in https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/Sets.java
        // If you don't care about empty mappings in the finalMap, then you don't even have to create the powerset
        finalMap.put(new HashSet<String>(Arrays.asList("john", "bob", "mary")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("john", "bob")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("bob", "mary")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("john", "mary")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("john")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("bob")), new HashSet<String>());
        finalMap.put(new HashSet<String>(Arrays.asList("mary")), new HashSet<String>());
    
        // Iterate over the original map to prepare the tmpMap.
        for(Entry<String, HashSet<String>> entry : originalMap.entrySet()) {
            for(String value : entry.getValue()) {
                HashSet<String> set = tmpMap.get(value);
                if(set == null) {
                    set = new HashSet<String>();
                    tmpMap.put(value, set);
                }
                set.add(entry.getKey());
            }
        }
    
        // Iterate over the tmpMap result and add the values to finalMap
        for(Entry<String, HashSet<String>> entry : tmpMap.entrySet()) {
            finalMap.get(entry.getValue()).add(entry.getKey());
        }
    
        // Print the output
        for(Entry<HashSet<String>, HashSet<String>> entry : finalMap.entrySet()) {
            System.out.println(entry.getKey() +" : "+entry.getValue());
        }
    

    The output of the code above would be :

    [bob] : []
    [john] : []
    [bob, mary] : [e]
    [bob, john] : [d, c]
    [bob, john, mary] : [a]
    [mary] : [f]
    [john, mary] : [b]