I have to implement a data structure that groups the elements of a equivalence classes.
The API:
interface Grouper<T>{
void same(T l, T r);
Set<EquivalenceClass<T>> equivalenceClasses();
}
interface EquivalenceClass<T>{
Set<T> members();
}
For example the grouping behaves like this:
Grouper g;
g.same(a, b);
g.equivalenceClasses() -> [[a,b]]
g.same(b, a);
g.equivalenceClasses() -> [[a,b]]
g.same(b, c);
g.equivalenceClasses() -> [[a,b,c]]
g.same(d, e);
g.equivalenceClasses() -> [[a,b,c], [d,e]]
g.same(c, d);
g.equivalenceClasses() -> [[a,b,c,d]]
I'm looking for an implementation that works up to ~10 million entries. It should be optimized to fill it and get the equivalence classes once.
Take a look at Union-Find. The union ("same") can be done trivially in O(log N)
, and can be done in effectively O(1)
with some optimizations. The "equivalenceClasses" is O(N)
, which is the cost of visiting everything anyways.