Strongly Connected Components
10 Mar 2019 Graph Tree Algorithm DFSStrongly connected components refer to connected components in a directed graph, and is useful to simplify a directed graph, for instance, merging all vertices in the same strongly connected components to be a big vertex, could produce a Directed Acyclic Graph (DAG) with less number of vertices. This post documents two well-known algorithms (i.e., Kosaraju’s Algorithm and Tarjan’s Algorithm) that solve the strongly connected component problem.
Problem Description
Connection components refers to those vertices in a graph that can reach each other through a path, which is consist of edges. They are strongly connected components if the edges are directed. The problem is to count the number of strongly connected components (or list vertices in each strongly connected component) in a given directed graph.
Kosaraju’s Algorithm
Kosaraju’s Algorithm does DFS twice to find all the strongly connected components. The first DFS push vertices into a stack in the post-order (parent after all its chilren, see my previous post Interative Tree Traversal (DFS)). The second DFS takes root vertex from the stack and is performed on a transposed graph, and every actual traversal triggered (there are chances that a vertex is visited before it is popped out to be root) would cover a strongly connected component. Here is an example implementation:
To my understanding, Kosaraju’s Algorithm works because it takes advantage of the following observations:
- Transposing graph does not change strongly connected components;
- A parent vertex (denoted by P) in the first DFS is guaranteed (by the post-order stack) to be root earilier than its children in the second DFS, so whenever a vertex (denoted by C) is visited in the second DFS, we are sure there is a path from P to C in the first DFS;
- There is a path from the root (P) to a vertex (C) on the transposed graph (in the second DFS), equals to there is a path from the C to P in the original graph;
The time complexity of Kosaraju’s algorithm is Θ(|V|+|E|) for adjacency list graph, O(|V|^2) for adjacency matrix.
Tarjan’s Algorithm
Transposing the graph and do DFS twice (Kosaraju’s Algorithm) may take time.
Tarjan’s Algorithm reduces it to one DFS.
First of all,
Tarjan’s Algorithm records current DFS path in a stack,
which would go through the same push/pop operations as the DFS stack but at different time points due to the different pop out conditions.
In addition,
Tarjan’s Algorithm also maintains two propeties,
desc
and lowlink
, for each vertex.
- The
desc
values increase as DFS visits more vertices, indicating the pre-order (parent before all its chilren, see my previous post Interative Tree Traversal (DFS)) of vertices visited in the DFS. - The
lowlink
value is initially assigned with thedesc
of the vertex, then is lowered as more vertices in the the vertex’s subgraph are visited.
A strongly connected component may be detected upon the traversal of a vertex is done (no more child vertex to visit).
If the vertex’s desc
equals to lowlink
,
which means this vertex is the one cannot reach back to any of its ancestor,
vertices in the stack are popped out until this vertex,
the root of the strongly connected component discovered.
Here comes an example implementation of Tarjan’s Algorithm:
We can see the conditions to lower lowlink
values are:
- visit an untouched vertex and perform DFS on it getting a lower
lowlink
value on that child vertex; - touch a visited vertex which is on the path to current vertex;
It worths mentioning that in the latter condition
the original paper chose to update lowlink
value with the visited vertex’s desc
value rather than lowlink
value.
I was confused and thought only using lowlink
makes sense.
However, practice tells me they do not make difference on the correctness of final result.
And the StackExchange thread here has a counterexample to illustrate updating with lowlink
cannot guarantee all vertices’ lowlink
value to be the root of the strongly connected component,
while using desc
gives lowlink
a meaningful definition:
LOWLINK(v) is the smallest vertex which is in the same component as v and is reachable by traversing zero or more tree arcs followed by at most one frond or cross-link.
To my understanding, Tarjan’s Algorithm finds the strongly connected component by catching the “one-way fork”.
As mentioned before, the stack Tarjan’s Algorithm maintained has different pop-out condition
(only triggered upon knowing a vertex is a “one-way fork”)
than the stack for DFS.
The vertices whose lowlink
is less than desc
would stay in the stack until they get assigned to a strongly connected component,
even they are done with traversal.
But if a vertex’s lowlink
still equals to its desc
after exhausting all potential ways looping back,
we are sure this vertex would be a “one-way fork”.
And because the special pop-out condition,
this vertex’s sub-graph,
the strongly connected component it belongs to,
is still on the stack.
We can easily poping out the vertices until this “one-way fork” vertex to get the current strongly connected component.
The time complexity of Kosaraju’s algorithm is O(|V|+|E|) for adjacency list graph, and the space complexity is O(|V|).
More solutions
In addition to aforementioned Kosaraju’s Algorithm and Tarjan’s Algorithm, there is one more DFS-based linear-time solution from Dijkstra’s book. Also there are reachability-based parallel algorithms worth one more post in the future.
Credits
- GeeksforGeeks - Thanks for offering a platform where I can see editorials and test my code.
- Tarjan’s SCC : example showing necessity of lowlink definition and calculation rule? - Thank Thophane Vallaeys for your answer that solves my confusion on Tarjan’s Algorithm’s weird choice.
- DEPTH-FIRST SEARCH AND LINEAR GRAPH ALGORITHMS - The origional paper for Tarjan’s Algorithm.
- Kosaraju’s algorithm - WikiPedia for Kosaraju’s Algorithm.
- Tarjan’s strongly connected components algorithm - WikiPedia for Tarjan’s Algorithm.
- Strongly connected component - WikiPedia for Strongly Connected Component problem.