Graph traversal


Suppose we want to find a path from one vertex to another in a graph, e.g., a route in a map from a destination to a goal, or a sequence of mutual friends connecting you to someone else. That's the problem of graph search. There are many approaches, depending on the information we have available. When all we have is the graph structure itself (i.e., nothing to help us know or guess how close we are to the goal), we're essentially doing "blind search".

The algorithms we'll look at today work equally well with directed or undirected graphs, or mixed graphs. Imagine a road system, where some highways have bidirectional traffic, but some streets are one-way.

Code files for today: GraphTraversal.java;

Slides from class

Depth first search

Depth-first search (DFS) is like what you probably do in exploring a maze. You keep going deeper and deeper in the maze, making choices at the various forks in the road, until you hit a dead-end. Then you back up to one of the choice points and make a different choice. [Example in class.] If you're fairy-tale clever, you'll scatter some bread crumbs behind you, so that you can see if you've come back to the same point and are just walking in circles. Then you'd want to do the same thing as if you hit a dead end.

To make this intuitive description a little more precise, we rely on a stack. For DFS we do the following:

push the start vertex s onto the stack
repeat until we find the goal vertex or the stack is empty:
  pop the next vertex u from the stack
  if u has not been visited 
    remember that u has been visited (and maybe do something while here) 
    for each vertex v that is adjacent to u
      if haven't already visited v 
        push v onto the stack

What ADT would you suggest to use, to keep track of which vertices have been visited?

The book uses recursion instead of an explicit stack. We discussed that relationship when we first discussed stacks, so comparing and contrasting here might help drive that home. Here is a recursive version:

visit(u) {
  if we found the goal, halt and declare victory
  if u has not been visited 
    remember that u has been visited (and maybe do something while here) 
    for each vertex v that is adjacent to u
      if haven't already visited v 
        visit(v)
}

visit(s)

The book also notes that edges can be broken into two groups for an undirected graph: discovery edges and back edges. Discovery edges are the first edge to find a vertex. They form a tree, the DFS tree. Back edges go back up the tree to an ancestor, and indicate that a cycle exists in the graph. With DFS on a directed graph, there are additional relationships in the DFS tree: forward edges connect a node to a descendent and cross edges connect two nodes, neither of which is the ancestor of the other.

How long does DFS take? We visit each vertex and each edge at most once. For a graph with n vertices and m edges, the running time is O(n + m).

The book presents a lot of problems that can be solved by DFS — reachability, strongly connected components, transitive closure, and topological sorting. An easy one to think about: does a graph have a loop in it? How would we detect that during DFS?

Breadth first search

For searching, DFS is a fine technique if we just want to find the goal, and don't want to keep too many alternatives around. But what if we want to find the shortest path? An example: the Bacon game / six degrees of separation. In that case, instead of searching in a depth-first manner, we search in a breadth-first manner.

Breadth-first search (BFS) expands uniformly in all "directions", like radiating ripples; the directions here are edges out of a vertex. So it finds all the vertices 1 step away, then all those 2 steps away, etc. Thus we know once we've found the goal, we've got the shortest set of edges from the start vertex to it. Do you see why?

This requires a small change in our search strategy: rather than using a stack, we use a queue. With a just a few word changes to our algorithm, we go from DFS to BFS:

enqueue the start vertex s onto the queue
repeat until we find the goal vertex or the queue is empty:
  dequeue the next vertex u from the queue
  if u has not been visited
    remember that u has been visited (and maybe do something while here) 
    for all vertices v that are adjacent to u
      if haven't already visited v
        enqueue v onto the queue

While the above code works, we can optimize it a bit by noting that the first item added to the queue will be the first item off of the queue. Therefore when we first discover a vertex we can consider it visited as we add it to the queue, knowing that later discoveries of the vertex need not be added to the queue. (This does not work for DFS, because the most recently added copy of the vertex is the one that comes off of the stack first, and thus gets visited with its adjacent vertices pushed onto the stack.) The optimized version is:

enqueue the start vertex s onto the queue
remember that s has been added
repeat until we find the goal vertex or the queue is empty:
  dequeue the next vertex u from the queue
    (maybe do something while here)
    for all vertices v that are adjacent to u
      if haven't already added v
        enqueue v onto the queue
        remember that v has been added

Note that we can't just look in the queue to see if v has already been added, as it might have been removed by now. So we'll again need to keep an auxiliary structure for that. I just changed "visited" to "added" because it didn't feel like much of a visit yet — not till we actually remove it and maybe do something with it.

The edge types in BFS are discovery edges or cross edges. No edge goes back to an ancestor, because it would have been visited in the forward direction earlier.

Instead of just keeping track of which vertices we've visited, it's helpful to keep track of how we arrived at each vertex. So instead of a visited list, we build a visited tree, where the parent of a vertex is the vertex from which we discovered it. Note that the parent is unique (since we don't visit a vertex twice), and thus gives a tree. However, we will often use a graph ADT to hold this tree. It gives us a way to do edge labels and to find particular vertices quickly.

We can do a BFS from a start vertex without any goal in mind — just run until we've visited the whole graph (or at least the part of it that is reachable from the start vertex). Then we have a BFS tree that has shortest paths from each reachable vertex to the start vertex. To find such a shortest path, we just find the desired vertex (e.g., an actor) in the tree, and follow edges back to the start vertex (e.g., Kevin Bacon); we can keep track of the edges (movie titles) on the way back. Demo.

Like DFS, BFS visits each node and edge once, for a total of O(n + m).