7.3.2 Basic Example: Breadth First Search

A graph is a series of nodes connected in any way using arcs. An arc is the connection between two nodes.

Arcs can be one way like you see from A to D, or they can be two ways, like A to B. Pathways can be established between nodes, such as A to B to C.

Complexity grows when you consider that not all arcs may be equal. For example, when you think of a graph connecting cities with roadways or airline routes, the arcs may represent miles with some being longer than others.

This complexity makes implementation challenging. Typical graph implementation usually follows one of three strategies:

Set of Arcs: One implementation strategy is to create a set of all arcs. This method tends to be efficient in storing the arcs but requires lots of searching to find a particular path through the graph.

Adjacency Matrix: An adjacency matrix is a 2D vector that essentially tracks the nodes across the top and left, then uses booleans inside the matrix to show which nodes are connected. This method is fast to search but can take a lot of extra data, especially when the number of connections is small relative to the number of nodes.

Adjacency List: The adjacency list tends to be the middle ground between the above two. Adjacency lists create a list of all outgoing connections for a particular node.

In this lesson, you are going to explore implementing graphs using the adjacency list method.

Using a Graph

So what do you do with a graph? Oftentimes the purpose of a graph is to see how two different nodes may be connected and the path between them. For example, in the graph shown above you may want to know which nodes you can get to from node B. A quick analysis shows that you can get to each node via the following paths:

B->A: A B->C: C B->D: A - D B->E: C - E

Node B connects with a distance of 1 to A and C and a distance of 2 to D and E.

A similar analysis of node D can be done:

D->A: None D->B: None D->C: C D->E: E

Because the nodes are one way, D has no path to either A or B and a direct path to C and D.

Traversing the Graph

To analyze the graph, there are multiple ways to traverse through the nodes. Similar to what you saw in recursion, you can do what is called a depth-first search which means you start at a node and go as far as you can before you back up.

For example, in the above graph, if you started at A, a depth-first search would have you search from A to B to C to E, then back up to search D.

In contrast, the breadth-first search looks at all the nodes 1 distance away, then all the nodes at a distance of 2, then 3, etc.

If you search the above graph using a breadth-first search from A, you would search to B, then to D. From there, you would search from B to C and from D to E. Finally, you would search C to E.

Using a depth-first search, you would discover that the path from A to E has a distance of 3. You may not discover that there is a shorter path from A to E via D since you have already discovered the connection.

A breadth-first search has the advantage that it will always discover the shortest path between two nodes, assuming the nodes are equally weighted.

For simplicity, in this lesson, you will only be looking at a breadth-first search assuming equally weighted nodes.

Implementation

As mentioned before, implementing a graph can be complicated. The full implementation can be seen below, but let’s break it down.

Node Struct

In addition to having the data (in this case the id field), the adjacency list gets implemented with the vector of nodes pointing to connections. In this vector, you will store each node that this particular node is connected to.

struct node{
    int id;
    vector<node*> connection;
    bool visited;
    node *prev;
    int distance;
};

The three remaining fields in the struct will be used to map out a path using the breadth-first search.

Creating the Graph

The createGraph function loads the following graph:

Notice that each node is a two-way arrow. When loading the graph, all the visited fields are set to false, which is the state used before the node is visited with the breadth-first search.

The calcDistance function is the breadth-first search. The search does two things, it calculates the distance from the start node to each node that it can reach, and it also maps the shortest path to that node. It does this by creating a queue of all the nodes it needs to visit and then looping until that queue is empty.

The first few lines are setup:

queue<node*> q;
start->visited = true;
start->distance = 0;
start->prev = NULL;
q.push(start);

By default, the starting node has been visited and has a distance of 0 and no previous node.

The while loop is used to map the path. Each time through the loop, the next node is removed from the queue and then all the connections for that node are explored using the for loop:

while (!q.empty()){
    node *currNode = q.front();
    q.pop();
    vector<node*> currConn = currNode->connection;
    for (node* conn : currConn){
        if (!conn->visited){
            conn->visited = true;
            conn->distance =  currNode->distance + 1;
            conn->prev = currNode;
            q.push(conn);
        }
    }
}

In the for loop, any node that has not yet been visited gets processed. Processing the node involves marking it as visited, setting the distance as one more than the distance to the current node (since it is one additional step from the current node), setting the previous node to the current node to establish the path, then add the node to the back of the queue.

Walk Through

Let’s look at a partial walk-through with the example graph. Using node 1 as a starting point, the first loop through will set the currNode to node 1. Then the currConn will be 2 and 5. When processing node 2, the node will be marked as visited, the distance will be updated to 1, and 2 will be added to the queue. The same process happens for node 5.

In the next loop, 2 will be removed from the queue and set to the currNode. currConn will be 3 and 5. When 3 is processed, the node will be marked as visited and the distance will be marked as 2 since it is one more than the distance of node 2. Node three will be added to the queue. Next, the program will look at 5 from the currNode list, but it will not process it since it is marked as visited from the previous loop when node 1 was the currConn.

This process will continue until there are no more nodes in the queue.

Last updated