A Star in Python

Posted: , Last Updated:

def a_star(graph, heuristic, start, goal):
    """
    Finds the shortest distance between two nodes using the A-star (A*) algorithm
    :param graph: an adjacency-matrix-representation of the graph where (x,y) is the weight of the edge or 0 if there is no edge.
    :param heuristic: an estimation of distance from node x to y that is guaranteed to be lower than the actual distance. E.g. straight-line distance
    :param start: the node to start from.
    :param goal: the node we're searching for
    :return: The shortest distance to the goal node. Can be easily modified to return the path.
    """

    # This contains the distances from the start node to all other nodes, initialized with a distance of "Infinity"
    distances = [float("inf")] * len(graph)

    # The distance from the start node to itself is of course 0
    distances[start] = 0

    # This contains the priorities with which to visit the nodes, calculated using the heuristic.
    priorities = [float("inf")] * len(graph)

    # start node has a priority equal to straight line distance to goal. It will be the first to be expanded.
    priorities[start] = heuristic[start][goal]

    # This contains whether a node was already visited
    visited = [False] * len(graph)

    # While there are nodes left to visit...
    while True:
        # ... find the node with the currently lowest priority...
        lowest_priority = float("inf")
        lowest_priority_index = -1
        for i in range(len(priorities)):
            # ... by going through all nodes that haven't been visited yet
            if priorities[i] < lowest_priority and not visited[i]:
                lowest_priority = priorities[i]
                lowest_priority_index = i

        if lowest_priority_index == -1:
            # There was no node not yet visited --> Node not found
            return -1

        elif lowest_priority_index == goal:
            # Goal node found
            # print("Goal node found!")
            return distances[lowest_priority_index]

        # print("Visiting node " + lowestPriorityIndex + " with currently lowest priority of " + lowestPriority)

        # ...then, for all neighboring nodes that haven't been visited yet....
        for i in range(len(graph[lowest_priority_index])):
            if graph[lowest_priority_index][i] != 0 and not visited[i]:
                # ...if the path over this edge is shorter...
                if distances[lowest_priority_index] + graph[lowest_priority_index][i] < distances[i]:
                    # ...save this path as new shortest path
                    distances[i] = distances[lowest_priority_index] + graph[lowest_priority_index][i]
                    # ...and set the priority with which we should continue with this node
                    priorities[i] = distances[i] + heuristic[i][goal]
                    # print("Updating distance of node " + i + " to " + distances[i] + " and priority to " + priorities[i])

                # Lastly, note that we are finished with this node.
                visited[lowest_priority_index] = True
                # print("Visited nodes: " + visited)
                # print("Currently lowest distances: " + distances)

About the algorithm and language used in this code snippet:

A Star Algorithm

The A star (A*) algorithm is an algorithm used to solve the shortest path problem in a graph. This means that given a number of nodes and the edges between them as well as the “length” of the edges (referred to as “weight”) and a heuristic (more on that later), the A* algorithm finds the shortest path from the specified start node to all other nodes. Nodes are sometimes referred to as vertices (plural of vertex) - here, we’ll call them nodes.

Description of the Algorithm

The basic principle behind the A star (A*) algorithm is to iteratively look at the node with the currently smallest priority (which is the shortest distance from the start plus the heuristic to the goal) and update all not yet visited neighbors if the path to it via the current node is shorter. This is very similar to the Dijkstra algorithm, with the difference being that the lowest priority node is visited next, rather than the shortest distance node. In essence, Dijkstra uses the distance as the priority, whereas A* uses the distance plus the heuristic.

Why does adding the heuristic make sense? Without it, the algorithm has no idea if its going in the right direction. When manually searching for the shortest path in this example, you probably prioritised paths going to the right over paths going up or down. This is because the goal node is to the right of the start node, so going right is at least generally the correct direction. The heuristic gives the algorithm this spatial information.

So if a node has the currently shortest distance but is generally going in the wrong direction, whereas Dijkstra would have visited that node next, A Star will not. For this to work, the heuristic needs to be admissible, meaning it has to never overestimate the actual cost (i.e. distance) - which is the case for straight line distance in street networks, for example. Intuitively, that way the algorithm never overlooks a shorter path because the priority will always be lower than the real distance (if the current shortest path is A, then if there is any way path B could be shorter it will be explored). One simple heuristic that fulfils this property is straight line distance (e.g. in a street network)

In more detail, this leads to the following Steps:

  1. Initialize the distance to the starting node as 0 and the distances to all other nodes as infinite
  2. Initialize the priority to the starting node as the straight-line distance to the goal and the priorities of all other nodes as infinite
  3. Set all nodes to “unvisited”
  4. While we haven’t visited all nodes and haven’t found the goal node:

    1. Find the node with currently lowest priority (for the first pass, this will be the source node itself)
    2. If it’s the goal node, return its distance
    3. For all nodes next to it that we haven’t visited yet, check if the currently smallest distance to that neighbor is bigger than if we were to go via the current node
    4. If it is, update the smallest distance of that neighbor to be the distance from the source to the current node plus the distance from the current node to that neighbor, and update its priority to be the distance plus its straight-line distance to the goal node

Example of the Algorithm

Consider the following graph: Graph for the A Star (A*) shortest path algorithm

The steps the algorithm performs on this graph if given node 0 as a starting point and node 5 as the goal, in order, are:

  1. Visiting node 0 with currently lowest priority of 8.0
  2. Updating distance of node 1 to 3 and priority to 9.32455532033676
  3. Updating distance of node 2 to 4 and priority to 10.32455532033676
  4. Visiting node 1 with currently lowest priority of 9.32455532033676
  5. Updating distance of node 3 to 9 and priority to 11.82842712474619
  6. Updating distance of node 4 to 13 and priority to 15.82842712474619
  7. Visiting node 2 with currently lowest priority of 10.32455532033676
  8. Visiting node 3 with currently lowest priority of 11.82842712474619
  9. Updating distance of node 5 to 12 and priority to 12.0
  10. Goal node found!

Final lowest distance from node 0 to node 5: 12

Runtime of the Algorithm

The runtime complexity of A Star depends on how it is implemented. If a min-heap is used to determine the next node to visit, and adjacency is implemented using adjacency lists, the runtime is O(|E| + |V|log|V|) (|V| = number of Nodes, |E| = number of Edges). If a we simply search all distances to find the node with the lowest distance in each step, and use a matrix to look up whether two nodes are adjacent, the runtime complexity increases to O(|V|^2).

Note that this is the same as Dijkstra - in practice, though, if we choose a good heuristic, many of the paths can be eliminated before they are explored making for a significant time improvement.

More information on how the heuristic influences the complexity can be found on the Wikipedia Article.

Space of the Algorithm

The space complexity of A Star depends on how it is implemented as well and is equal to the runtime complexity, plus whatever space the heuristic requires.

Python

The Python Logo

Python™ is an interpreted language used for many purposes ranging from embedded programming to web development, with one of the largest use cases being data science.

Getting to “Hello World” in Python

The most important things first - here’s how you can run your first line of code in Python.

  1. Download and install the latest version of Python from python.org. You can also download an earlier version if your use case requires it - many technologies still require it due to the breaking changes introduced with Python 3.
  2. Open a terminal, make sure the python or python3 command is working, and that the command your’re going to be using is referring to the version you just installed by running python3 --version or python --version. If you’re getting a “command not found” error (or similar), try restarting your command line, and, if that doesn’t help, your computer. If the issue persists, here are some helpful StackOverflow questions for Windows, Mac and Linux.
  3. As soon as that’s working, you can run the following snippet: print("Hello World"). You have two options to run this: 3.1 Run python in the command line, just paste the code snippet and press enter (Press CTRL + D or write exit() and press enter to exit). 3.2 Save the snippet to a file, name it something ending with .py, e.g. hello_world.py, and run python path/to/hello_world.py. Tip: use the ls command (dir in Windows) to figure out which files are in the folder your command line is currently in.

That’s it! Notice how printing something to the console is just a single line in Python - this low entry barrier and lack of required boilerplate code is a big part of the appeal of Python.

Fundamentals in Python

To understand algorithms and technologies implemented in Python, one first needs to understand what basic programming concepts look like in this particular language.

Variables and Arithmetic

Variables in Python are really simple, no need to declare a datatype or even declare that you’re defining a variable; Python knows this implicitly.

a = 1
b = {'c':2}

print(a + b['c']) # prints 3

Arrays

Working with arrays is similarly simple in Python:

arr = ["Hello", "World"]

print(arr[0]) # Hello
print(arr[1]) # World
# print(arr[2]) # IndexError

arr.append("!")

print(arr[2]) # !

As those of you familiar with other programming language like Java might have already noticed, those are not native arrays, but rather lists dressed like arrays. This is evident by the fact that no size needs to be specified, and elements can be appended at will. In fact, print(type(arr)) prints <class 'list'>. This means that arrays in Python are considerably slower than in lower level programming languages. There are, however, packages like numpy which implement real arrays that are considerably faster.

Conditions

Just like most programming languages, Python can do if-else statements:

value = 1
if value==1:
    print("Value is 1")
elif value==2:
    print("Value is 2")
else:
    print("Value is something else")

Python does however not have case-statements that other languages like Java have. In my opinion, this can be excused by the simplicity of the if-statements which make the “syntactic sugar” of case-statements obsolete.

Loops

Python supports both for and while loops as well as break and continue statements. While it does not have do-while loops, it does have a number of built-in functions that make make looping very convenient, like ‘enumerate’ or range. Here are some examples:

value = 10
while value > 0:
    print(value)
    value -= 1

for index, character in enumerate("banana"):
    print("The %d-th letter is a %s" % (index + 1, character))

Note that Python does not share the common iterator-variable syntax of other languages (e.g. for(int i = 0; i < arr.length; i++) in Java) - for this, the enumerate function can be used.

Functions

Functions in Python are easily defined and, for better or worse, do not require specifying return or arguments types. Optionally, a default for arguments can be specified:

def print_something(something="Hello World"):
    print(something)
    return "Success"

print_something()
print(print_something("banana"))

(This will print “Hello World”, “Banana”, and then “Success”)

Syntax

As you might have noticed, Python does not use curly brackets ({}) to surround code blocks in conditions, loops, functions etc.; This is because Python depends on indentation (whitespace) as part of its syntax. Whereas you can add and delete any amount of whitespace (spaces, tabs, newlines) in Java without changing the program, this will break the Syntax in Python. This also means that semicolons are not required, which is a common syntax error in other languages.

Advanced Knowledge of Python

Python was first released in 1990 and is multi-paradigm, meaning while it is primarily imperative and functional, it also has object-oriented and reflective elements. It’s dynamically typed, but has started offering syntax for gradual typing since version 3.5. For more information, Python has a great Wikipedia article.