C++ Boost

Six Degrees of Kevin Bacon

A fun application of graph theory comes up in the popular game ``Six Degrees of Kevin Bacon''. The idea of the game is to connect an actor to Kevin Bacon through a trail of actors who appeared together in movies, and do so in less than six steps. For example, Theodore Hesburgh (President Emeritus of the University of Notre Dame) was in the movie ``Rudy'' with the actor Gerry Becker, who was in the movie ``Sleepers'' with Kevin Bacon. Why Kevin Bacon? For some reason the three students who invented the game, Mike Ginelli, Craig Fass, and Brian Turtle decided that Kevin Bacon is the center of the entertainment world. The Kevin Bacon game is quite similar to the Erdös Number which has ``been a part of the folklore of mathematicians throughout the world for many years''.

The ``Six Degrees of Kevin Bacon'' game is really a graph problem. If you assign each actor to a vertex, and add an edge between two actors if they have appeared together in a movie, then you have a graph that represents the data for this game. Then the problem of finding a trail of actors to Kevin Bacon becomes a traditional graph problem: that of finding a path between two vertices. Since we wish to find a path that is shorter than six steps, ideally we would find the shortest path between the vertices. One might think to apply Dijkstra's shortest path algorithm to this problem, but that would be overkill since Dijkstra's algorithm is meant for situations when each edge has an associated length (or weight) and the goal is to find the path with the shortest cumulative length. Since we are only concerned with finding the shortest paths in terms of the number of edges the breadth-first search algorithm will solve the problem (and use less time than Dijkstra's).

Input File and Graph Setup

For this example we will use a much smaller graph than all movies and all actors. The full source code for this example is in examples/kevin-bacon.cpp. We have supplied a file kevin_bacon.dat which contains a list of actor pairs who appeared in the same movie. Each line of the file contains an actor's name, a movie, and another actor that appeared in the movie. The ``|'' character is used as a separator. Here is an excerpt.

Patrick Stewart|Prince of Egypt, The (1998)|Steve Martin

Our first task will be to read in the file and create a graph from it. We use a std::ifstream to input the file.

  std::ifstream datafile("./kevin_bacon.dat");
  if (!datafile) {
    cerr << "No ./kevin_bacon.dat file" << endl;
    return -1;
  }

We will want to attach the actor's names to the vertices in the graph, and the movie names to the edgesWe use the property class to specify the addition of these vertex and edge properties to the graph. We choose to use an undirected graph, because the relationship of actors appearing together in a movie is symmetric.

  using namespace boost;
  typedef adjacency_list<vecS, vecS, undirectedS, 
    property<vertex_name_t, string>,
    property<edge_name_t, string, property<edge_weight_t, int> >
  > Graph;

To access the properties, we will need to obtain property map objects from the graph. The following code shows how this is done.

  boost::property_map<Graph, vertex_name_t>::type actor_name = get(vertex_name, g);
  boost::property_map<Graph, edge_name_t>::type connecting_movie = get(edge_name, g);
  boost::property_map<Graph, edge_weight_t>::type weight = get(edge_weight, g);

Next we can start to loop through the file, parsing each line into a list of tokens. stringtok is a C++ version of the old strtok standard C function.

  std::string line;
  while (std::getline(datafile,line)) {
    std::list<std::string> line_toks;
    boost::stringtok(line_toks, line, "|");
    ...
  }

Each line of the input corresponds to an edge in the graph, with the two actors as the vertices incident to the edge, and the movie name will be a property attached to the edge. One challenge in creating the graph from this format of input is that the edges are specified by a pair of actor names. As we add vertices to the graph, we'll need to create a map from actor names to their vertices so that later appearances of the same actor (on a different edge) can be linked with the correct vertex in the graph. The std::map can be used to implement this mapping.

  typedef graph_traits<Graph>::vertex_descriptor Vertex;
  typedef std::map<string, Vertex> NameVertexMap;
  NameVertexMap actors;

The first token will be an actor's name. If the actor is not already in our actor's map we will add a vertex to the graph, set the name property of the vertex, and record the vertex descriptor in the map. If the actor is already in the graph, we will retreive the vertex descriptor from the map's pos iterator.

  NameVertexMap::iterator pos; 
  bool inserted;
  Vertex u, v;

  std::list<std::string>::iterator i = line_toks.begin();

  boost::tie(pos, inserted) = actors.insert(make_pair(*i, Vertex()));
  if (inserted) {
    u = add_vertex(g);
    actor_name[u] = *i;
    pos->second = u;
  } else
    u = pos->second;
  ++i;

The second token is the name of the movie, and the third token is the other actor. We use the same technique as above to insert the second actor into the graph.

  std::string movie_name = *i++;
      
  boost::tie(pos, inserted) = actors.insert(make_pair(*i, Vertex()));
  if (inserted) {
    v = add_vertex(g);
    actor_name[v] = *i;
    pos->second = v;
  } else
    v = pos->second;

The final step is to add an edge connecting the two actors, and record the name of the connecting movie. We also set the weight of the edge to one. Since we chose setS for the EdgeList type of the adjacency_list, any parallel edges in the input will not be inserted into the graph.

  Edge e;
  boost::tie(e, inserted) = add_edge(u, v, g);
  if (inserted) {
    connecting_movie[e] = movie_name;
    weight[e] = 1;
  }

Applying Breadth-First Search

The documentation for breadth_first_search() algorithm shows that we will need to provide property maps for color and vertex ID. In addition, we will be using visitors to record predecessors and distances so we will need property maps for these as well. The distance will be the shortest distances from each actor to Kevin Bacon calculated by the algorithm. This distance has also been referred to as the ``Bacon Number'' in memory of the ``Erdos Number'' used to rank mathematicians according to how many publications separate them from Paul Erdos. We will use a vector to store the bacon numbers, and another vector to store the color property needed by the BFS algorithm. The predecessor vector will be used to record the shortest path from each actor to Kevin. For each vertex, the predecessor vector will contain a pointer to the next vertex on the shortest path towards Kevin Bacon.

  std::vector<int> bacon_number(num_vertices(g));
  std::vector<int> color(num_vertices(g));
  std::vector<Vertex> predecessor(num_vertices(g));

The BFS algorithm also requires a source vertex as input; of course this will be Kevin Bacon. We now call the BGL BFS algorithm, passing in the graph, source vertex, the property maps, and a visitor composed of a predecessor and distance recorder. We use the predecessor_recorder to record the predecessors and the distance_recorder to record distances.

    Vertex src = actors["Kevin Bacon"];
    
    breadth_first_search
      (g, src, 
       visitor(make_bfs_visitor
	       (make_list(record_predecessors(&predecessor[0], 
					      on_examine_edge()),
			  record_distances(&bacon_number[0],
					   on_examine_edge()))))
       );

We can output the Bacon number for each actor simply by looping through all the vertices in the graph and use them to index into the bacon_number vector.

    graph_traits<Graph>::vertex_iterator i, end;
    for (boost::tie(i, end) = vertices(g); i != end; ++i)
      std::cout << actor_name[*i] << "'s bacon number is " 
                << bacon_number[*i] << std::endl;

Note that vertex descriptor objects can not always be used as indices into vectors or arrays such as bacon_number. This is valid with the adjacency_list class with VertexList=vecS, but not with other variations of adjacency_list. A more generic way to index based on vertices is to use the ID property map (vertex_index_t) in coordination with the iterator_property_map.

Here are some excepts from the output of the program.

William Shatner was in Loaded Weapon 1 (1993) with Denise Richards
Denise Richards was in Wild Things (1998) with Kevin Bacon
Patrick Stewart was in Prince of Egypt, The (1998) with Steve Martin
...
William Shatner's bacon number is 2
Denise Richards's bacon number is 1
Kevin Bacon's bacon number is 0
Patrick Stewart's bacon number is 2
Steve Martin's bacon number is 1    
...


Copyright © 2000-2001 Jeremy Siek, Indiana University (jsiek@osl.iu.edu)