Topics
Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
A graph is a type of diagram made up of vertices which are connected by edges.
An edge can connect any vertex to any vertex, including the same vertex. A self edge is an edge that connects a vertex to itself.
The graph to the right has vertices A,B,C and D, and edges between
A and A
A and B
A and C
B and C
B and D
C and D
Graphs are a powerful representation for understanding the connections or relationships between things.
When two vertices have an edge between them, we say that they are adjacent.
In the graph to the right, the vertices adjacent to D are C and B.
The vertices adjacent to C are all the other vertices: A,B and D.
The vertices adjacent to A are B,C and also A itself because it has a self edge (this sort of depends on convention, so the IB won't ask you this).
If two edges "connect" via a vertex, we say they are adjacent.
In the graph to the right, the edge AB is adjacent to the edges
DB (they share vertex B)
both edges CB (they share vertex B)
AC (they share vertex A)
AA (they share vertex A)
The degree of a vertex is the total number of edges that connect to that vertex. To determine the degree of a vertex, count rhe
In the graph to the right, the vertex C has the incident edges:
CD
AC
CB twice
So the degree of C is 4.
Repeat edges are important, as when they exist the degree of a vertex is not the number of vertices it is connected to.
Self edges are tricky: In the graph above there is an edge between A, so that edge "connects" to A twice. It therefore counts 2 towards the degree of A, which is therefore 4 once we consider AC and AB.
A graph is said to be connected if it is possible to get from any vertex to any other vertex by walking along the edges.
Connected
Not connected
A graph where every vertex is connected by at least one edge to every other vertex is said to be complete.
Complete
Not complete - no edge BE
Weighted graph is one where each edge has a number which we call a weight. In applied settings, these represent important properties of the connection or relation between vertices.
The first graph below shows a weighted graph of bus connections between cities, where weighted graphs represent the driving distance, in kilometers, between the two cities.
The second graph shows a plumbing network between different junctions, with directed edges representing how water can move. Here the edge weights represent the volume of water that can move through the various pipes per second.
A directed graph is a special case where the edges have direction. These directions are represented by arrows on the graph.
For a directed graph, we can count the in-degree and out-degree. Both are similar to degree, but we only count edges pointing in for in-degree, and edges pointing out for out-degree.
On the graph to the right, the vertex C has
The edges D→C, A→C and B→C pointing in, so its in-degree is 3.
The edge C→B pointing out, so its out-degree is 1.
The vertex A has
the edge A→A pointing in, so an in-degree of 1.
the edges A→A,A→C and A→B pointing out, so an out-degree of 3.
Notice that self-edges count towards both the out-degree and in-degree, since they leave and enter the same vertex.
A directed graph is strongly connected if you can get from any vertex to any other vertex by following the directed edges of the graph.
Strongly connected
Not strongly connected - no way from C to A.
A directed graph is strongly connected if you can get from any vertex to any other vertex by following the directed edges of the graph.
Strongly connected
Not strongly connected - no way from D to A.
A simple graph is one where there are no
directed edges
self edges
repeated edges
Simple graph
Not simple graphs
An adjacency matrix is a matrix or table that shows the possible movements between vertices on a graph.
Each row is a starting vertex, and each column an ending vertex. The nth row and jth column contains
0 if there is no edge from vertex n to vertex j
1 for each edge from n to j.
Note that the rows and columns are usually ordered alphabetically.
The following table shows a few examples of what each cell represents.
An adjacency matrix for an undirected graph is necessarily symmetric long its main diagonal, since every edge is two-way.
In this case, a self edge counts double because it contributes 1 towards both the start and finish.
We can read the degree of a vertex from an adjacency matrix by adding up the values in the relevant row / column.
Undirected graph
An undirected graph is symmetric, so we can look add up a row or column to find the degree.
In the above adjacency matrix, vertex A has degree 1+2+0=3.
Directed graph
Vertices on directed graphs have an in-degree and out-degree.
Since the rows show the source vertex, adding up a row gives the out-degree of a vertex. Adding up a column gives the in-degree of a vertex.
In the above adjacency matrix, vertex A has out-degree 1+0+0=1, and in-degree 1+1+1=3.
A walk is a sequence of valid moves on a graph, going from vertex to vertex along edges that allow movement in that direction.
The walk ACDC is shown on an undirected graph above.
The walk ABBA on a directed graph. Notice how the dashed edges are numbered to indicate the order they are in the walk.
A trail is a special type of walk where no edges are repeated.
There is no restriction on whether vertices can be repeated.
The walk above is a trail because no edge is walked more than once.
The walk above is not a trail because AE is walked twice.
A path is an even more special type of walk where no edges or vertices are repeated.
The walk above is a path.
The walk above is not a path, since A is visited twice.
A circuit is a special type of walk that starts and ends at the same vertex and does not reuse any edges.
The walk above is not a circuit, since it does not start and end at the same place.
The walk above is a circuit.
The walk above is a circuit.
Repeats DB so not a circuit.
A cycle is a special case of a circuit where no vertex, except the last one, is repeated.
Every cycle is a circuit, but not every circuit is a cycle.
The walk above is a cycle.
Vertex A repeated in the middle so not a cycle, though it is a circuit.
By raising the adjacency matrix to the power of k, we can find the number of possible walks with length k between any two vertices.
The graph on the left has the adjacency matrix
Let's consider the walks of length 5:
To count the number of length 5 walks that start at A and end at A, we look at the row for A (first row) and the column for D, where we see a 5.
The process is the same for an undirected graph, except that the adjacency matrix will be symmetric.
Taking an existing graph and erasing one or more vertex or edge produces a smaller graph called a subgraph.
Original Graph
Subgraph examples
A tree is a graph that is
connected
simple
has no cycles
A tree with n vertices always has exactly n−1 edges.
To see why this is true imagine starting with no edges (everything is disconnected), and adding edges 1 at a time.
The first edge we add connects 2 vertices, so our tree has 2 vertices and 1 edge
Then, we add an edge connecting one of these two vertices to one we have not connected to yet. This adds 1 new vertex to the tree, so now we have 3 vertices and 2 edges.
We keep going until we stop, and we have n vertices and n−1 edges.
We cannot have any more edges, because now that every vertex is connected, adding another edge would form a loop somewhere!
A minimum spanning tree is a tree with the smallest possible weight that connects every vertex in a graph. It is often given the abbreviation "MST".
Basically, the smallest sum of edges that keep everything connected. The total weight of the minimum spanning tree is then just the sum of the weights of the edges in the tree.
Below you can see a graph and its minimum spanning tree.
The total weight of the minimum spanning tree is 3+3+2+4+5=17.
Note that MSTs are not unique - some graphs have multiple MSTs with the same weight. For example, another MST for the graph above is:
Kruskal's Algorithm is a procedure for finding minimum spanning trees on a graph.
The algorithm works by selecting edges one at a time, starting with the smallest weight edge and working our way up. Any edge that would form a cycle (a loop of vertices) with the edges we've already selected gets skipped. We stop when every vertex is connected.
The following animation shows the process for one particular graph.
Note that there will often be ties for edge weights. This is the reason why minimum spanning trees are not unique, even though they all have the same total weight.. In IB questions, any valid MST will be accepted.
Prim's Algorithm is an algorithm for finding minimum spanning trees.
It works very similar to Kruskal's Algorithm, adding edges one at a time in order of smallest to largest weights, without forming cycles. The differences are that with Prim's algorithm:
We can start at any vertex, often specified in the question
we can only add edges that connect to the tree we've made so far.
Below is a side by side comparison of each step.
Prim's
Kruskal's
The difference is that at every step, all the vertices in Prim's algorithm are connected, while in Kruskal's they can be temporarily disconnected. They both produce trees of the same weight, and often will produce an identical tree.
Prim's algorithm is ideal for finding the minimum spanning tree when we have the adjacency table of a graph, instead of the graph itself.
Consider this example:
Let's start at B.
The 1st vertex we chose was B, so we put a 1 above that column.
For us this is the 9 in row E. If there was a tie, we would just pick randomly.
We just selected row E, so we delete that row, and add a 2 in the column above it.
Now we have columns B and E, so we look for the smallest number in those columns. That number is 8, which connects to A. So now we delete row A, number column A with a 3, and continue.
The next smallest number in a valid column is the 2 in row D. So we delete that row, and number that column with a 4.
Now we have columns A, B, D and E. The smallest remaining edge weight in any of those rows is the 13 in row C. So we circle it, cross out that row, and add a 5 above the column for C.
Now that all the rows are crossed out, we read the tree by looking at the cells we have circled:
BE with weight 9
AE with weight 8
AC with weight 13
AD with weight 2
An Eulerian circuit is a special kind circuit that traverses every edge exactly once.
In other words, a walk is Eulerian if it starts and ends at the same vertex, uses every edge, but does not reuse any edge.
Eulerian circuit ✓
Not an Eulerian circuit ✗, because DB not traversed.
Not an Eulerian circuit ✗, because DA traversed twice.
Not an Eulerian circuit ✗, does not start and end at the same vertex (not a circuit)
An Eulerian trail is a special kind of trail that traverses every edge exactly once.
It differs from an Eulerian circuit by the fact that a circuit must start and end in the same place, while an Eulerian trail can start and end at different vertices. Every Eulerian circuit is an Eulerian trail, but not versa. The following graph shows an Eulerian trail which is not an Eulerian circuit:
If a graph contains an Eulerian circuit, we call it an Eulerian graph.
If a graph contains an Eulerian trail but not an Eulerian circuit, we call it semi-Eulerian.
A graph can be Eulerian, semi-Eulerian, or neither.
Eulerian graph
Semi-Eulerian graph
If a graph is Eulerian, it must be that every vertex in the graph has an even degree.
If a graph is semi-Eulerian, then exactly two vertices in the graph have odd degree and all the others have even degree.
This also works backwards: If every vertex in a graph has even degree, then we know it's Eulerian. And if only two vertices have odd degrees, then we know it's semi-Eulerian.
The Eulerian graph above has vertices with even degree:
A: 4
B: 2
C: 2
D: 2
E: 2
The semi-Eulerian graph above has exactly two vertices with odd degree:
A: 1
B: 2
C: 3
D: 2
E: 2
This is true because an Eulerian circuit must visit every vertex, and each time it visits a vertex, it must use one edge to enter the vertex and one edge to leave the vertex, which means that edges have to come in pairs at every vertex.
A semi-Eulerian graph has exactly two vertices of odd degree. Any Eulerian trail on this graph must start at one of the vertices with odd degree and end at the other odd-degree vertex.
This is true because every time a trail passes through a vertex, it uses one edge to enter and one edge to leave. So every visit uses a pair of edges. There are only two exceptions, the starting vertex which we can leave without entering and the ending vertex which we enter and then never leave. Since an Eulerian trail has to use every edge, the only way to use an "unpaired" edge is at the start or the end.
Because A and C are the vertices with odd degree, any Eulerian trail on the graph must start at either A or C and end at the other vertex with odd degree.
A Chinese postman problem is any scenario where we're trying to find the shortest way to start and end at the same vertex and visit every edge in a graph at least once.
The name is very helpful for remembering what the problem actually is. Imagine a postman that starts at a post office at one of the vertices in the graph, and the postman has to travel along every edge, which you can imagine representing roads in a graph, to deliver mail to all the houses along all the roads. The postman must eventually return to the starting vertex, which is the post office, and wants to minimize the total distance that he has to cover.
The animation below shows a Chinese postman tour for a graph.
If a graph is Eulerian, then any Eulerian circuit is a solution to the Chinese postman problem.
Remember that an Eulerian circuit starts and ends in the same location, and visits every edge exactly once. The Chinese Postman problem looks for a circuit that visits every edge at least once, so a circuit that visits every edge exactly once is necessarily a shortest possible solution.
If a graph is semi-Eulerian, then by definition it does not have an Eulerian circuit, which means to visit every edge at least once we will have to repeat at least one edge.
We know that a semi-Eulerian graph has an Eulerian trail starting at one of the odd vertices and ending at the other. To find a solution to the Chinese Postman problem in this case, we first find an Eulerian Trail and then find the shortest possible path from the end of the trail back to its start.
In most exam scenarios, you will be asked to find the total weight of a solution and not necessarily find the Chinese postman tour itself. in the two-odd vertex scenario, the total weight of the solution will be the sum of all the weights of all the edges in the graph plus the weight of the shortest path between the two vertices of odd degree because those edges are going to have to be repeated.
In the example above, the vertices C and F both have degree 3, and they're the only ones with odd degree. The shortest path between them goes CDF with weight CD equals 5 and DF equals 5 for a total repeated weight of 10. the total weight of all the edges in the graph is 44, so the weight of the optimal solution is 44+10=54
The last case that IB exams will expect you to consider is the Chinese Postman problem when exactly four vertices are odd.
You can think of this case as an extension of the semi-Eulerian case, where two vertices were odd. In this case, we have two pairs of odd vertices so we're going to have to repeat two paths.
Consider the following graph with odd vertices A,B,C and D.
The shortest path between
A and B is 4
C and D is 6
A and C is 5
B and D is 1
A and D is 4+1=5
B and C is 1+6=7
We can pair them in 3 ways:
AB and CD, total weight 4+6=10
AC and BD, total weight 5+1=6
AD and BC, total weight 5+7=12
The smallest of these is 6, so the weight of the optimal solution is 37 (the sum of all weights in the graph) +6=43
A Hamiltonian path is a path which visits every vertex once, with no repetition of vertices.
Note that a Hamiltonian path does not require every edge to be visited. However, no edge can be repeated without repeating a vertex and so no edge is repeated either.
A Hamiltonian cycle is a cycle that visits every vertex with no vertex repeated except the start / end.
Note that the same graph can have multiple Hamiltonian Cycles.
The cycle AEFDABCA is not Hamiltonian since A is repeated in the middle.
The cycle AEFDCBA is Hamiltonian.
Both graphs above are Hamiltonian (they are the same graph).
When a graph has a Hamiltonian cycle, we call it a Hamiltonian graph.
When a graph has a Hamiltonian path, but not a Hamiltonian cycle, we call it a semi-Hamiltonian graph.
The traveling salesman problem is a scenario where we are looking for the shortest path that starts and ends at the same vertex on a graph, and visits every other vertex exactly once.
Each of these paths is a Hamiltonian cycle by definition. To find the solution, we're looking for the Hamiltonian cycle with the smallest total weight.
A Hamiltonian cycle with total weight 3+8+6+3+7=27.
A Hamiltonian cycle with total weight 3+8+9+3+4=27
The classical traveling salesman problem is a special case of the problem where the graph we're looking at is complete and where the shortest path between any two vertices is the edge between those vertices.
There is no "good" way to find the shortest Hamiltonian cycle other than finding all the Hamiltonian cycles and comparing their weights. There are, however, ways to find upper and lower bounds on the optimal solution. Then we know that the optimal solution is somewhere between those two.
In the special case where the graph is complete, we can use the so-called nearest neighbor algorithm to find an upper bound on the least weight Hamiltonian path.
A complete graph is always Hamiltonian.
The algorithm works like this:
Choose any starting vertex.
Move to the nearest vertex that has not yet been visited (choosing at random if there is a tie)
Repeat step 2 until all vertices are visited.
Move back to the starting vertex.
Add up the weights of this Hamiltonian cycle.
Because this cycle has visited every vertex exactly once and returned to the start, it is at least a solution to the traveling salesman problem. And if there is a more optimal solution, it's only because it has lower weight. That means that our solution can be considered an upper bound on the traveling salesman problem.
The deleted vertex algorithm is a method for determining a lower bound on the solution to the traveling salesman problem in the case where the graph is complete.
The algorithm is based on the following observation: If you take a traveling salesman tour, which is a loop through all the vertices, and you delete one of the vertices, what you end up with is a tree that spans all of the remaining vertices. That remaining tree cannot be any smaller than the minimum spanning tree.
We can work backwards from this fact by deleting a chosen vertex from our graph and finding the minimum spanning tree of what remains. Next, we reconnect our deleted vertex to this minimum spanning tree. And because we're looking for a cycle, we need at least two edges connected to our deleted vertex. We therefore choose the two shortest edges that connect to the MST.
The practical traveling salesman problem is the broader case of searching for the shortest Hamiltonian circuit in a graph that is not necessarily complete, and where the edges between vertices are not necessarily the shortest path between those vertices.
For example, in the graph below, the edge CE has weight 8, but CA+AE has a weight 7, so the shortest path between C and E is not the direct one.
The good news is that we can turn a practical traveling salesman problem into a classical case by finding the least distances between each pair of vertices.
The graph ABCDE is not complete, and the edge CE=8 is longer than path CAE=7. Let's build a table of least distances. In each cell, we put the shortest distance between the row and column vertex.
Note that since the graph is undirected, the table is symmetric, so there's no need to fill in the bottom half.
Now we essentially have this updated, complete graph, where every edge is the shortest distance between its vertices.
This takes us back to the classical case, where we can use the nearest neighbor and deleted vertex algorithms to find upper and lower bounds.
Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
Track your progress:
Don't know
Working on it
Confident
📖 = included in formula booklet • 🚫 = not in formula booklet
A graph is a type of diagram made up of vertices which are connected by edges.
An edge can connect any vertex to any vertex, including the same vertex. A self edge is an edge that connects a vertex to itself.
The graph to the right has vertices A,B,C and D, and edges between
A and A
A and B
A and C
B and C
B