Strongly Connected Components

Strongly connected components refer to connected components in a directed graph, and is useful to simplify a directed graph, for instance, merging all vertices in the same strongly connected components to be a big vertex, could produce a Directed Acyclic Graph (DAG) with less number of vertices. This post documents two well-known algorithms (i.e., Kosaraju’s Algorithm and Tarjan’s Algorithm) that solve the strongly connected component problem.

Problem Description

Connection components refers to those vertices in a graph that can reach each other through a path, which is consist of edges. They are strongly connected components if the edges are directed. The problem is to count the number of strongly connected components (or list vertices in each strongly connected component) in a given directed graph.

Kosaraju’s Algorithm

Kosaraju’s Algorithm does DFS twice to find all the strongly connected components. The first DFS push vertices into a stack in the post-order (parent after all its chilren, see my previous post Interative Tree Traversal (DFS)). The second DFS takes root vertex from the stack and is performed on a transposed graph, and every actual traversal triggered (there are chances that a vertex is visited before it is popped out to be root) would cover a strongly connected component. Here is an example implementation:

To my understanding, Kosaraju’s Algorithm works because it takes advantage of the following observations:

  1. Transposing graph does not change strongly connected components;
  2. A parent vertex (denoted by P) in the first DFS is guaranteed (by the post-order stack) to be root earilier than its children in the second DFS, so whenever a vertex (denoted by C) is visited in the second DFS, we are sure there is a path from P to C in the first DFS;
  3. There is a path from the root (P) to a vertex (C) on the transposed graph (in the second DFS), equals to there is a path from the C to P in the original graph;

The time complexity of Kosaraju’s algorithm is Θ(|V|+|E|) for adjacency list graph, O(|V|^2) for adjacency matrix.

Tarjan’s Algorithm

Transposing the graph and do DFS twice (Kosaraju’s Algorithm) may take time. Tarjan’s Algorithm reduces it to one DFS. First of all, Tarjan’s Algorithm records current DFS path in a stack, which would go through the same push/pop operations as the DFS stack but at different time points due to the different pop out conditions. In addition, Tarjan’s Algorithm also maintains two propeties, desc and lowlink, for each vertex.

  • The desc values increase as DFS visits more vertices, indicating the pre-order (parent before all its chilren, see my previous post Interative Tree Traversal (DFS)) of vertices visited in the DFS.
  • The lowlink value is initially assigned with the desc of the vertex, then is lowered as more vertices in the the vertex’s subgraph are visited.

A strongly connected component may be detected upon the traversal of a vertex is done (no more child vertex to visit). If the vertex’s desc equals to lowlink, which means this vertex is the one cannot reach back to any of its ancestor, vertices in the stack are popped out until this vertex, the root of the strongly connected component discovered.

Here comes an example implementation of Tarjan’s Algorithm:

We can see the conditions to lower lowlink values are:

  • visit an untouched vertex and perform DFS on it getting a lower lowlink value on that child vertex;
  • touch a visited vertex which is on the path to current vertex;

It worths mentioning that in the latter condition the original paper chose to update lowlink value with the visited vertex’s desc value rather than lowlink value. I was confused and thought only using lowlink makes sense. However, practice tells me they do not make difference on the correctness of final result. And the StackExchange thread here has a counterexample to illustrate updating with lowlink cannot guarantee all vertices’ lowlink value to be the root of the strongly connected component, while using desc gives lowlink a meaningful definition:

LOWLINK(v) is the smallest vertex which is in the same component as v and is reachable by traversing zero or more tree arcs followed by at most one frond or cross-link.

To my understanding, Tarjan’s Algorithm finds the strongly connected component by catching the “one-way fork”. As mentioned before, the stack Tarjan’s Algorithm maintained has different pop-out condition (only triggered upon knowing a vertex is a “one-way fork”) than the stack for DFS. The vertices whose lowlink is less than desc would stay in the stack until they get assigned to a strongly connected component, even they are done with traversal. But if a vertex’s lowlink still equals to its desc after exhausting all potential ways looping back, we are sure this vertex would be a “one-way fork”. And because the special pop-out condition, this vertex’s sub-graph, the strongly connected component it belongs to, is still on the stack. We can easily poping out the vertices until this “one-way fork” vertex to get the current strongly connected component.

The time complexity of Kosaraju’s algorithm is O(|V|+|E|) for adjacency list graph, and the space complexity is O(|V|).

More solutions

In addition to aforementioned Kosaraju’s Algorithm and Tarjan’s Algorithm, there is one more DFS-based linear-time solution from Dijkstra’s book. Also there are reachability-based parallel algorithms worth one more post in the future.

Credits