Topological Sorting

Topological sorting is an operation performed on Directed Acyclic Graph (DAG) to generate a order of all vertices, so that vertex v would not appear before vertex u if there exists a path from vertex u to vertex v.

Given a DAG, there could be more than one result from topological sorting. Here is an example from wikipedia.

DAG having more than one possible result of topological sorting

  • 5, 7, 3, 11, 8, 2, 9, 10 (visual left-to-right, top-to-bottom)
  • 3, 5, 7, 8, 11, 2, 9, 10 (smallest-numberd available vertex first)
  • 5, 7, 3, 8, 11, 10, 9, 2 (fewest edges first)
  • 7, 5, 11, 3, 10, 8, 9, 2 (largest-numbered available vertex first)
  • 5, 7, 11, 2, 3, 8, 9, 10 (attapting top-to-bottom, left-to-right)
  • 3, 7, 8, 5, 11, 10, 2, 9 (arbitrary)

DFS

The straight forward idea is to do Depth First Search (DFS) pushing vertices into a stack in the post-order (parent after all children, see Iterative Tree Traversal (DFS) for an example implementation). After all vertices get into the stack, pop them out will give a topological order. Here comes an example:

Kahn’s Algorithm

Another algorithm to do topological sort is Kahn’s Algorithm. The algorithm maintances a queue to hold vertices, which have no incoming edge or all incoming edges have been pruned. Every iteration it remove one vertex from the queue and append it to topological order, then prune the edge from this vertex to its children. The algorithm needs to store incoming edge list as well, so that it can check whether the vertex is ready get into the queue after prune an edge. Here is the pesudo-code from wikipedia:

L <- Empty list that will contain the sorted elements
S <- Set of all nodes with no incoming edge
while S is non-empty do
    remove a node n from S
    add n to tail of L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle or not all connected)
else
    return L (a topologically sorted order)

Applications

Topological sorting is as known as “dependency resolution”, because Date-Flow Graph (DFG) is usually represented as a DAG, where dependencies become directed edges. Therefore, topological sorting is useful for data flow related study.

Shortest path is another graph processing problem which can do its computation based on the results of topological sorting. From the root vertex, which should be the first one in the result of topological sorting, we can decide the distance for each vertex one by one. Topological order guarantees when we calculate the distance of a vertex, the distances of all vertices that have edges to the vertex have been determined.

Credits