I need to check how many separate graphs would be created where "n lines contains pairs of positive integers, where each pair identifies a connection between two vertices in a graph. ". Suppose we have 3 pairs: [4 3], [1 4], [5 6]. The answer will be 2, because [4 3]&[1 4] will merge into one graph: [1 3 4] and the 2nd one is [5 6]. If we add one more pair: [4 3], [1 4], [5 6], [4 6], the answer will be 1 (only 1 graph: [1 3 4 5 6] are connected).
I managed to solve the problem in Java, but it's not efficient, and for more than 10000+ pairs it works terribly slow. The code looks more or less that:
Set<Pair> connectionsSet;
HashSet<TreeSet<Integer>> createdConnections;
public void createGraphs() {
for (Pair pair : connectionsSet) {
boolean foundLeft = false, foundRight = false;
for (TreeSet<Integer> singleSet : createdConnections) {
if (singleSet.contains(pair.getLeft())) foundLeft = true;
if (singleSet.contains(pair.getRight())) foundRight = true;
}
if (!foundLeft && !foundRight)
addNewGraph(pair);
else if (foundLeft && !foundRight)
addToExistingGraph(pair, Constants.LEFT);
else if (!foundLeft && foundRight)
addToExistingGraph(pair, Constants.RIGHT);
else if (foundLeft && foundRight)
mergeGraphs(pair);
}
}
private void addNewGraph(Pair pair) {
createdConnections.add(new TreeSet<>(pair.asList()));
}
private void addToExistingGraph(Pair pair, String side) {
for (TreeSet<Integer> singleSet : createdConnections) {
if (side.equals(Constants.LEFT) && singleSet.contains(pair.getLeft()))
singleSet.add(pair.getRight());
if (side.equals(Constants.RIGHT) && singleSet.contains(pair.getRight()))
singleSet.add(pair.getLeft());
}
}
private void mergeGraphs(Pair pair) {
Optional<TreeSet<Integer>> leftSetOptional = getOptional(pair.getLeft());
Optional<TreeSet<Integer>> rightSetOptional = getOptional(pair.getRight());
if (leftSetOptional.isPresent() && rightSetOptional.isPresent()){
TreeSet<Integer> leftSet = leftSetOptional.get();
TreeSet<Integer> rightSet = rightSetOptional.get();
rightSet.addAll(leftSet);
createdConnections.removeIf(singleSet -> singleSet.contains(pair.getLeft()));
createdConnections.removeIf(singleSet -> singleSet.contains(pair.getRight()));
createdConnections.add(rightSet);
}
}
How can I improve the performance? I'm not asking for a ready-made solution, but maybe there is an algorithm that I don't know about that could significantly speed it up?
To just get the count of connected components, you can use a disjoint set. This is a simple implementation assuming input as a List
of edges.
Map<Integer, Integer> parent;
int find(int x) {
int p = parent.getOrDefault(x, x);
if (p != x) p = find(p);
parent.put(x, p);
return p;
}
public int numConnectedComponents(List<Pair> edges) {
parent = new HashMap<>();
for (var e : edges) {
int lPar = find(e.getLeft()), rPar = find(e.getRight());
parent.put(lPar, rPar);
}
int comps = 0;
for (var k : parent.keySet())
if (find(k) == k) ++comps;
return comps;
}
If the number of nodes (n
) is known and we can assume all node labels are integers between 1
and n
, we can optimize by using an array instead of a Map
and keeping track of the number of connected components while adding edges.
int[] parent;
int find(int x) {
return x == parent[x] ? x : (parent[x] = find(parent[x]));
}
public int numConnectedComponents(int nodes, List<Pair> edges) {
parent = new int[nodes + 1];
int comps = 0;
for (var e : edges) {
if (parent[e.getLeft()] == 0) {
parent[e.getLeft()] = e.getLeft();
++comps;
}
if (parent[e.getRight()] == 0) {
parent[e.getRight()] = e.getRight();
++comps;
}
int lPar = find(e.getLeft()), rPar = find(e.getRight());
if (lPar != rPar) {
parent[lPar] = rPar;
--comps;
}
}
return comps;
}