Monthly Archives: December 2025

How Dijkstra’s Algorithm Works (The Logic Behind It).

Recently I started working on a library in Haskell that would implement dijkstra’s algorithm both for fun and for an upcoming project. What I learned is that I didn’t initially understand dijkstra’s algorithm as well as I had thought and many of the explanations out there weren’t very good and often relied on showing an example of it in action.

Teach principles, not formulas.

– Richard Feynman

Let’s learn the logic behind Dijkstra’s algorithm, shall we?

What’s the point of it?

To the best of my knowledge, Dijkstra’s algorithm only shows up in a few places (There are only two or three that I’m aware of), but it’s vital to understand if you want to play around with graph theory, and if you want to traverse graphs.

There’s this amazing technology out there that you may have heard of. It’s called GPS technology. Technically GPS refers to the global positioning system. It’s a system of satellites in space that send out radio signals which can be received by any GPS capable device which can then use those signals to calculate the exact latitude, longitude, and height of the device.

However, when most people think of GPS they think of navigation systems that can give you directions. That’s what we’ll be looking at here. Dijkstra’s algorithm allows us to plan the shortest route from point A to point B.

The first step in Dijkstra’s algorithm actually comes before we’ve even written any lines of code to implement dijkstra’s algorithm. We need a map. Thankfully there are maps out there that we can download of entire cities that are machine-readable. The idea is simple: take a map and turn it into what’s called a “graph”.

A graph (as in the kind from graph theory) is a set of circles (called “nodes”) and lines connecting them (called “edges”). There’s more than one kind of graph. For our example we’ll be looking at graphs that are directed and weighted. Directed means they have arrows instead of just regular lines (sometimes streets are one-way), and weighted meaning that each arrow has a number attached to it to indicate how long the path is, or how much time it will take to get there.

The edges will be roads, and the nodes will be intersections. For streets that go both ways we can simply set up a second arrow that points in reverse. So for one-way roads there will be one arrow, and for two-way roads there will be two pointing in opposite directions.

Once we have that our navigation system will figure out which node is closest to where we are, and which node is closest to the destination and have use navigate a short distance to the start, and when we get to the end it will say something like “Your destination is on the left” and then it will cowardly end the trip and not tell us how to get out of the car and walk into the building we’re headed for.

Getting lost in it

So how do we navigate graphs? Up until now everything is fairly simple, but how do you get the shorted path from A to B? When designing an algorithm (or, in our case, understanding one) we need to be able to have some insights into the problem. For example: if we were to magically somehow know that a given node is part of the shortest path from A to B then the shortest path from A to that node is the first part of the trip, and the shortest path from our middle node to B is the second part of the trip.

If we know the shortest path from the start node to each node on the graph then we’ve essentially solved the problem. We just need to get the path associated with the end node.

So here’s how we do it. First we’ll create an army of wind-up robots that will go through the graph. Each of these wind up robots will be a data structure which contains two pieces of information: the total distance the bot has traveled so far from the starting node, and the exact path it took. The distance can be an integer or float, and the path variable is a list (or array if you’re brave enough) containing the names of (or pointers to) each node that we’ve traveled along so far in the order we’ve gone.

Each node is a data structure which has a name of some kind (an integer or string) and an array or list of edges. These edges are each data structures that contain a number (their weight) and a pointer or identifier that points to their destination node. We’ll also tack on two pieces of information to the node structs: the distance we’ve traveled to get there, and the path we took (just like with our bots).

When a wind-up robot crosses an edge it takes the number from that edge and adds it to it’s internal number, and adds the edge it just crossed to the path variable inside it (this is a fairly obvious thing to do since that’s what traveling along a graph means). Now, assuming that all edges have a positive, non-infinite number associated with them, we should be able to pull off a simple trick: when a bot lands on a new node it will check the number the node has, and if the number on the node is greater than the number the bot is carrying then the bot replaces the number on that node with the one it was carrying. It also does the same with the path variable it had.

When another bot later finds itself on that node, if the bot had found a shorter path to get there then it will replace the path and number on that node. If it took a longer path then it will be able to tell from comparing the number on the node to it’s number and it will know to backtrack to where it just came from.

Hopefully some gears are starting to turn in your head. We’ll initialize every nodes’ number to be infinity because all numbers are less than infinity and because floating point numbers can represent infinity as a value (pretty handy here) except for the start node which will be zero. All their paths will initially be empty as well.

This sounds a lot like a problem involving recursion, doesn’t it? We could write a function that acts as one single bot and it can take in our initial node. It will then call itself once for each edge that node has as it travels along them. It can follow the rules from earlier and after it’s done investigating every top-level edge it should have visited every node enough times to have found the shortest path to that node, and so we should have solved the problem by that point.

Here’s some pseudo code:

function dijkstra (distSoFar, pathSoFar, node):
        if distSoFar < node.dist:
                node.dist = distSoFar
                node.path = pathSoFar
        for edge in node.edges:
                dijkstra(distSoFar + edge.weight, pathSoFar + edge,edge.destNode) 

And that’s how it’s done. Except for a few problems: the first problem is that it doesn’t uses parallelism despite the fact that that seems like it could work here (that’s really a crime in itself). The second problem is that it takes way too long to run. I tried implementing it in Haskell with a dash of parallelism and left it running overnight on a graph of New York city and it wasn’t finished in the morning.

The Problem

The algorithm, as described, isn’t fast enough on large graphs. So what’s the real answer?We have to rethink some things here. First we’ll get rid of putting the path onto each node, and instead store info on what edge it was that took us to that node from the shortest path (thus each node has info on what node came just before it, and what edge was used to get there).

We’ll also drop carrying the path in each bot, and instead focus on just the distances. Once all is said and done we can work backwards from the end node to build up a path to it. So how can we do all of this more efficiently? This is where priority queues come into play.

First make a queue that contains pointers to all nodes which are initialized to infinity and have some undefined value for the previous node except, of course, for the start node which will have a distance of zero.

Then we let our bot loose on the start node. It goes along each edge going out from the node, but doesn’t recurse down the next edges. For each node it meets it decides whether or not to update the distance on that node. The bot will also not accumulate distances anymore and instead will just use the distance from the node it just left and add that to the weight of the edge it traveled along.

Next we have a list of data structures that point to each node. Most of the nodes still have an infinite distance, but some of them have different finite distances. So we sort the list based on the distances that each node has (or do the smart thing and not call up a sort function every time by simply knowing what nodes to try and move and not checking the entire list) and now the first node in the queue has the shortest distance (the start node since that started with zero).

Then we remove the first node from the queue. Then for every entry in the queue we do the same. We have a bot travel along from that node along each outgoing edge that node has, update the distances, resort the priority queue, and then drop the first entry in the queue.

This ensures that we are starting at the start node, and slowly working our way to each adjacent node, and then each node after that, and then the ones after that. On the graph it looks like we’re fanning out from the start node.

According to Wikipedia we’ll be done either once we reach the end node on the queue or when there are no more nodes in the queue (stopping at the end node seems to be some sort of optimization). The idea is that we’re starting with the node that has the shortest distance each time we look at the priority queue, and updating the adjacent nodes so that they can (possibly) be updated by the node with the shortest path so far.

The nodes that are updated each round can sometimes be the nodes that were already removed from the queue, and every node will have a chance to update all of their neighbors. All nodes get reached eventually since we’re pulling them from a queue and since we’re always starting with the shortest distances that means every node will be quickly updated with the shorter distances first.

The nodes that are updated each round can sometimes be the nodes that were already removed from the queue, and every node will have a chance to update all of their neighbors. All nodes get reached eventually since we’re pulling them from a queue and since we’re always starting with the shortest distances that means every node will be quickly updated with the shorter distances first.