performance graph time-complexity big-o graph-theory

How to find time complexity of connected components graph algorith,

I am trying to find the time complexity of the following algorithm. So far I have two options: O(VE) and O(V + E) (where V is the number of vertices and E the number of edges). I just can't figure it out though.

Here is the alorithm:

for each node in the graph:
    set its component to its value;   // O(n)

for every edge in the graph:                  // O(v)
    if its end vertices are in different components  
       set all the nodes in each of their components to the same component        // O(n)

I have the worst case time complexities of specific parts (indicated above) but not the whole thing. How do I get the overall time complexity from this? thanks!

I tried getting the complexity of individual sections of the algorithm. I don't know how to get the overall complexity from this.

I have this alternative algorithm. For an adjacency list, I am guessing this would be O(EV + V^2). I am more confident about this one than the first one, but if someone could confirm that would be great!

components(G):
  for all vertices v ∈ G:
    componentOf[v] = -1
  compID = 0  // component ID
  for all vertices v ∈ G:
    if componentOf[v] == -1:
      dfsComponent(G, v, compID)
      compID = compID + 1
 
dfsComponent(G, v, id):
  componentOf[v] = id
  for each (v,w) ∈ edges(G):
    if componentOf[w] == -1:
      dfsComponent(G, w, id)

Solution

As the variables of the input are 𝑉 and 𝐸, there shouldn't be something like O(𝑛) in your analysis. Also, the second (outer) loop will have a number of iterations that depends on 𝐸, not 𝑉.

Secondly, the step "set all the nodes in each of their components to the same component" can be implemented in different ways:

Choose a new component number and update the nodes in both components
Read the component number used in the first component, and update the nodes in the second component to that component number
Check which of the two components is the smallest (by maintaining a size per component number), and update the nodes in that component to get the component number of the other component (and update the size of the chosen component number).

Variant 1

In the first variant, the worst case occurs when each edge we visit, connects a single component with a node that is so far a component on its own. That means we have to update the component number of a steadily increasing number of nodes: 2 + 3 + 4 + ... + 𝑉. This is one less than a triangular number, and so represents O(𝑉²). Add to this that we have to visit all edges, also those that connect nodes that are already known to belong to the same component, and we end up with O(𝐸 + 𝑉²).

Variant 2

In the worst case, the second variant has the same complexity as the first, as we may be unlucky and always update the nodes of the large component with the component number of the node that is a component on its own.

Variant 3

The third variant improves on this.

If we correct that in the first algorithm, then what would be a worst case in the previous variants (adding a so-far isolated node to a growing component), becomes a best case here: each added edge would then represent O(1) work (updating the component number of one node only). This is not the worst case for this variant. Now it is worst case if the two nodes belong to components that are about the same size. We can imagine sizes 1 + 1 = 2, and 2 + 2 = 4, ...etc. We can see that a specific node will be part of a merge at most log𝑉 times, because the component it belongs to will double in size at each merge. This leads to a time complexity of O(𝐸 + 𝑉log𝑉).

DFS algorithm

The DFS algorithm you presented at the end is again an improvement. If we consider the graph to be undirected, then an edge will be traversed at most two times: once in each direction, and the second time no recursive call will follow, as the end node will already have been visited (and have a component number). I assume the list of edges (v,w) can be retrieved from an adjacency list in a time that is linear to the number of edges the node v has (which would not be the case when we have an adjacency matrix).

In terms of nodes we can see we will have to visit each node at least once in the main loop.

So its time complexity is O(𝐸 + 𝑉).