javasetminimum-spanning-treekruskals-algorithm

# Problem removing from Java-HashSet while implementing Kruskal's algorithm

in the following code I am trying to implement Kruskal's algorithm to compute the minimal spanning tree of a graph.

The problem is that removing sets from the connected components does not work properly and that the case if(!startSet.equals(endSet)) always seems to be executed.

Now the code:

``````    import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;

import org.jgrapht.Graph;
import org.jgrapht.graph.AsSubgraph;
import org.jgrapht.graph.DefaultWeightedEdge;
import java.util.PriorityQueue;

public class mySpanningTree {

//Receives a graph and computes a minimum spanning tree
public static AsSubgraph<Integer, DefaultWeightedEdge> computeMST(Graph<Integer, DefaultWeightedEdge> graph) {
AsSubgraph<Integer, DefaultWeightedEdge> tree = new AsSubgraph<Integer, DefaultWeightedEdge>(graph, graph.vertexSet(), new HashSet<DefaultWeightedEdge>());

PriorityQueue<DefaultWeightedEdge> edgeQueue = new PriorityQueue<>(Comparator.comparingDouble(graph::getEdgeWeight));
Set<Set<Integer>> connectedComponents = new HashSet<>();
for(Integer i : graph.vertexSet()){
Set<Integer> set = new HashSet<>();
}
int n = tree.vertexSet().size() - 1;
while (!edgeQueue.isEmpty() && tree.edgeSet().size() < n) {
DefaultWeightedEdge edge = edgeQueue.poll();
Integer start = graph.getEdgeSource(edge);
Integer end = graph.getEdgeTarget(edge);
Set<Integer> startSet =  null;
Set<Integer> endSet = null;
for(Set<Integer> set: connectedComponents){
if(set.contains(start)){
startSet = set;
}
if(set.contains(end)){
endSet = set;
}
}
if(!startSet.equals(endSet)){
connectedComponents.remove(endSet);
}
else{
}
}
return tree;
}
``````

The idea of the code is to keep track of the connected components in a set. Whenever an edge connects two nodes from different connected components, I want to union the two connected components and store the result inside the set instead of the two components from before. Furthermore, the edge shall be added. Whenever an edge has two endpoints in one and the same set, it shall not be added (since adding it would create a cycle).

Any help is appreciated!

Solution

• The value of `connectedComponents` is a mutable set, whose elements are also mutable sets. The problem is, that by destructively modifying the elements of `connectedComponents` without "notifying" the container, the container's data structures are silently corrupted.

Hash sets are managed data structures, that use an integer value (the `hashCode`) derived from the contents of their elements to guide where they put the object internally in their private data structures. This mechanism gives fast insertion and retrieval (usually in the order of `O(1)`). However, for this to work, the `hashCode` must not change while the object is present in the container.

In your code, the container is a `HashSet<HashSet<Integer>>` and the elements are `HashSet<Integer>` instances. For Java's sets, the hash code is computed based on the what the set currently contains. So, when you do

``````Set<Integer> set = new HashSet<>();
``````

the `connectedComponents` container computes a hash code from `set` and uses that to determine a place to put the object into in its internal data structures. However, when you do

``````startSet.addAll(endSet);
``````

you mutate the contents of `startSet` which in turn causes its hash code to change. From `connectedComponent`'s perspective, the object is now "in the wrong place" in its internal data structures, and hence it is likely, that it will not be found again. The important thing here is, that data structure like `HashSet` are based on `equals` and `hashCode`, and for many data structures (including `HashSet`) those are based on the actual contents, not merely on object identity.

To address this problem, try removing the element set from `connectedComponents` before mutating it, and then add it back afterward:

``````if(!startSet.equals(endSet)){
connectedComponents.remove(startSet);  // Added back below ...
connectedComponents.remove(endSet);
}
``````

Alternatively, it might help to keep track of the element sets using a different container, like an `IdentityHashMap` (use whatever you like as the value). This would work here, since you do not need the container to track the element sets by their content (as `HashSet` implicitly does).

For example:

``````IdentityHashMap<Set<Integer>, Object> connectedComponents = new IdentityHashMap<>();

...

for(Integer i : graph.vertexSet()){
Set<Integer> set = new HashSet<>();
connectedComponents.put(set, set);
}

...

for(Set<Integer> set: connectedComponents.keySet()) {
...
if(set.contains(start)){
startSet = set;
}
if(set.contains(end)){
endSet = set;
}
}

if(!startSet.equals(endSet)) {
connectedComponents.remove(endSet);
I am not sure right now, whether it should remain `!startSet.equals(endSet)` or whether you might want to use `startSet != endSet` with the modified code, though. The former version uses "deep equality" for the test (i.e., considers the actual contents the the sets involved) whereas the latter version works by object identity only, which I think would be more appropriate if you consider the `IdentityHashMap` route.