Search code examples
javaalgorithmgraph-theory

Building graphs based on pairs of connected vertices


I need to check how many separate graphs would be created where "n lines contains pairs of positive integers, where each pair identifies a connection between two vertices in a graph. ". Suppose we have 3 pairs: [4 3], [1 4], [5 6]. The answer will be 2, because [4 3]&[1 4] will merge into one graph: [1 3 4] and the 2nd one is [5 6]. If we add one more pair: [4 3], [1 4], [5 6], [4 6], the answer will be 1 (only 1 graph: [1 3 4 5 6] are connected).

I managed to solve the problem in Java, but it's not efficient, and for more than 10000+ pairs it works terribly slow. The code looks more or less that:

    Set<Pair> connectionsSet;
    HashSet<TreeSet<Integer>> createdConnections;
    
    public void createGraphs() {
        for (Pair pair : connectionsSet) {
            boolean foundLeft = false, foundRight = false;
            for (TreeSet<Integer> singleSet : createdConnections) {
                if (singleSet.contains(pair.getLeft())) foundLeft = true;
                if (singleSet.contains(pair.getRight())) foundRight = true;
            }
            if (!foundLeft && !foundRight)
                addNewGraph(pair);
            else if (foundLeft && !foundRight)
                addToExistingGraph(pair, Constants.LEFT);
            else if (!foundLeft && foundRight)
                addToExistingGraph(pair, Constants.RIGHT);
            else if (foundLeft && foundRight)
                mergeGraphs(pair);
        }
    }

    private void addNewGraph(Pair pair) {
        createdConnections.add(new TreeSet<>(pair.asList()));
    }

    private void addToExistingGraph(Pair pair, String side) {
        for (TreeSet<Integer> singleSet : createdConnections) {
            if (side.equals(Constants.LEFT) && singleSet.contains(pair.getLeft()))
                singleSet.add(pair.getRight());
            if (side.equals(Constants.RIGHT) && singleSet.contains(pair.getRight()))
                singleSet.add(pair.getLeft());
        }
    }

    private void mergeGraphs(Pair pair) {
        Optional<TreeSet<Integer>> leftSetOptional = getOptional(pair.getLeft());
        Optional<TreeSet<Integer>> rightSetOptional = getOptional(pair.getRight());

        if (leftSetOptional.isPresent() && rightSetOptional.isPresent()){
            TreeSet<Integer> leftSet = leftSetOptional.get();
            TreeSet<Integer> rightSet = rightSetOptional.get();

            rightSet.addAll(leftSet);

            createdConnections.removeIf(singleSet -> singleSet.contains(pair.getLeft()));
            createdConnections.removeIf(singleSet -> singleSet.contains(pair.getRight()));
            createdConnections.add(rightSet);

        }
    }

How can I improve the performance? I'm not asking for a ready-made solution, but maybe there is an algorithm that I don't know about that could significantly speed it up?


Solution

  • To just get the count of connected components, you can use a disjoint set. This is a simple implementation assuming input as a List of edges.

    Map<Integer, Integer> parent;
    int find(int x) {
        int p = parent.getOrDefault(x, x);
        if (p != x) p = find(p);
        parent.put(x, p);
        return p;
    }
    public int numConnectedComponents(List<Pair> edges) {
        parent = new HashMap<>();
        for (var e : edges) {
            int lPar = find(e.getLeft()), rPar = find(e.getRight());
            parent.put(lPar, rPar);
        }
        int comps = 0;
        for (var k : parent.keySet())
            if (find(k) == k) ++comps;
        return comps;
    }
    

    If the number of nodes (n) is known and we can assume all node labels are integers between 1 and n, we can optimize by using an array instead of a Map and keeping track of the number of connected components while adding edges.

    int[] parent;
    int find(int x) {
        return x == parent[x] ? x : (parent[x] = find(parent[x]));
    }
    public int numConnectedComponents(int nodes, List<Pair> edges) {
        parent = new int[nodes + 1];
        int comps = 0;
        for (var e : edges) {
            if (parent[e.getLeft()] == 0) {
                parent[e.getLeft()] = e.getLeft();
                ++comps;
            }
            if (parent[e.getRight()] == 0) {
                parent[e.getRight()] = e.getRight();
                ++comps;
            }
            int lPar = find(e.getLeft()), rPar = find(e.getRight());
            if (lPar != rPar) {
                parent[lPar] = rPar;
                --comps;
            }
        }
        return comps;
    }