Let's say I have the following:
john: [a, b, c, d]
bob: [a, c, d, e]
mary: [a, b, e, f]
Or reformatted slightly so you can easily see the groupings:
john: [a, b, c, d]
bob: [a, c, d, e]
mary: [a, b, e, f]
What's the most common or most efficient algorithm to generate the following grouping?
[john, bob, mary]: [a]
[john, mary]: [b]
[john, bob]: [c,d]
[bob, mary]: [e]
[mary]: [f]
[john]: []
[bob]: []
After quickly googling, it seems like the above keys represent the "power set". So I was planning on the following impl:
1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}} // j = john, b = bob, m = mary
2) Generate set of all letters: {a, b, c, d, e, f}
3) Iterate over subsets, and for each letter, see if letter exists in all elements of the subset
So...
subset = {j, b, m}
letter = a
j contains a? true
b contains a? true
m contains a? true
* add a to subset {j, b, m}
letter = b
j contains b? true
b contains b? false, continue
letter = c
j contains c? true
b contains c? true
m contains c? false, continue
.....
subset = {j, m}
.....
Is there a better solution?
EDIT: The above algorithm is flawed. For instance, {j, m} would also contain "a", which I don't want. I suppose I can simply modify it so that in each iteration, I also check to see if this letter is "NOT IN" the elements not in this set. So in this case, I would also check:
if b does not contain a
Step 3 (Iterate over subsets) would be inefficient, as it does either "j contain a" or "a not in j" for each element in your power set.
Here is what I would suggest:
1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}}. You don't need this step is you don't care about empty mappings in the final output.
2) Iterate over all the elements in your original data structure and construct following:
[a] : [j, b, m]
[b] : [j, m]
[c] : [j, b]
[d] : [j, b]
[e] : [b, m]
[f] : [m]
3) Reverse the above structure and aggregate (using maps of [j, b,...] to list of [a,b...] should do the trick) to get this:
[j, b, m] : [a]
[j, m] : [b]
[j, b] : [c, d]
[b, m] : [e]
[m] : [f]
4) Compare 3 with 1 to fill in the remaining empty mappings.
Edit: Hers is the complete code in Java
// The original data structure. mapping from "john" to [a, b, c..]
HashMap<String, HashSet<String>> originalMap = new HashMap<String, HashSet<String>>();
// The final data structure. mapping from power set to [a, b...]
HashMap<HashSet<String>, HashSet<String>> finalMap = new HashMap<HashSet<String>, HashSet<String>>();
// Intermediate data structure. Used to hold [a] to [j,b...] mapping
TreeMap<String, HashSet<String>> tmpMap = new TreeMap<String, HashSet<String>>();
// Populate the original dataStructure
originalMap.put("john", new HashSet<String>(Arrays.asList("a", "b", "c", "d")));
originalMap.put("bob", new HashSet<String>(Arrays.asList("a", "c", "d", "e")));
originalMap.put("mary", new HashSet<String>(Arrays.asList("a", "b", "e", "f")));
// Hardcoding the powerset below. You can generate the power set using the algorithm used in googls guava library.
// powerSet function in https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/Sets.java
// If you don't care about empty mappings in the finalMap, then you don't even have to create the powerset
finalMap.put(new HashSet<String>(Arrays.asList("john", "bob", "mary")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("john", "bob")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("bob", "mary")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("john", "mary")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("john")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("bob")), new HashSet<String>());
finalMap.put(new HashSet<String>(Arrays.asList("mary")), new HashSet<String>());
// Iterate over the original map to prepare the tmpMap.
for(Entry<String, HashSet<String>> entry : originalMap.entrySet()) {
for(String value : entry.getValue()) {
HashSet<String> set = tmpMap.get(value);
if(set == null) {
set = new HashSet<String>();
tmpMap.put(value, set);
}
set.add(entry.getKey());
}
}
// Iterate over the tmpMap result and add the values to finalMap
for(Entry<String, HashSet<String>> entry : tmpMap.entrySet()) {
finalMap.get(entry.getValue()).add(entry.getKey());
}
// Print the output
for(Entry<HashSet<String>, HashSet<String>> entry : finalMap.entrySet()) {
System.out.println(entry.getKey() +" : "+entry.getValue());
}
The output of the code above would be :
[bob] : []
[john] : []
[bob, mary] : [e]
[bob, john] : [d, c]
[bob, john, mary] : [a]
[mary] : [f]
[john, mary] : [b]