Lab 3 (Part I): Search Algorithms (Machine Learning Onramp)
Breadth-First Search (BFS) algorithm
After completing this lab, you will be able to implement the Breadth-First Search (BFS) search algorithm in Python.
The first search problem we are focusing on is Nick’s route-finding problem in Romania, starting in Arad to reach Bucharest. The road map of Romania, which is the state space of the problem is given as follows:
Elements of Search Algorithm
As we discussed in the class, in order to execute a Search Algorithm, we need to define the following items:
- State Representation
- The State Space
- The Transition Model
Furthermore, you will need your program to be more robust to be able to implement other Search Algorithms without drastic changes. More on that later.
1. State Representation
In this problem, the only state we need to consider is the location of Nick. Therefore we can use the names of the cities as the state.
Since Nick is starting in Arad, going to Bucharest, we can define the initial state and goal state as:
initial_state = "Arad"
goal_state = "Bucharest"
2. The State space
The transition model provides the way to identify the child of a node in a search tree given a specific action. In this problem, we need to translate the whole state space from the geographical network into the program together with the step costs between the connected states.
The important information from the state space is the connections between states and the step costs between connected states. Notice that in this problem the connections are reversible. Therefore in our state space the connection from Arad to Zerind and the connection from Zerind to Arad are identical, hence only one instance of that connection is needed.
The most straightforward way of defining the state space is by using a nested array, in which each inner array consists of the two connected states and its cost.
Define a variable state_space
to store the state space. As seen below, it is a nested list (a list containing inner lists). The following code shows the first three elements in the nested list.
state_space = [
['Arad', 'Zerind', 75],
['Arad', 'Timisoara', 118],
['Timisoara', 'Lugoj', 111],
...
]
Task 1: Complete the list of cities and distances in the state_space
variable
3. The Transition model
In this problem, the transition model is defined by the action of traveling between connected cities. To represent this action, we can create a function that searches through the state space to find the children of the current node, which effectively represents the available travel options.
This function iterates through the state_space
variable, checking for connections linked to the current node. Each city connected to the current node becomes a child of that node. We define a new function called expandAndReturnChildren
to handle this process.
def expandAndReturnChildren(state_space, node):
children = []
for [n,m,c] in state_space:
if n == node:
children.append(m)
elif m == node:
children.append(n)
return children
This function explores the state_space, and returns a list of all possible children of node
.
Discussion: - how does this function actually work? read the code and see if you can understand its inner workings. - Instead of n,m,c, why not use something more descriptive like City1, City2, Cost ?
Task 2: Implement expandAndReturnChildren
in your program and test its validity
Using Python Classes & Objects (Object Oriented Programming)
Although we can proceed with the current code to implement BFS, it will become increasingly complex as the search deepens. To make the code more manageable and adaptable for future search algorithms, we can adopt an object-oriented approach. By defining a class to represent each "city" in the state space tree and using objects to model each city instance, we simplify the structure. This not only enhances readability but also makes the code more robust and flexible for implementing other search strategies.
To achieve this, we define each city as a Node. Each node stores key information, - state: the name of the city itself - parent: the name of the preceding city in the search tree. - children: a list cites in the branches in the search tree.
The Class definition also includes a method that adds children cities to the current node (more on that later)
class Node:
def __init__(self, state=None, parent=None):
self.state = state
self.parent = parent
self.children = []
def addChildren(self, children):
self.children.extend(children)
This structure allows us to trace the path from any node back to the initial state, enabling efficient path reconstruction once the goal is found.
Class in Python
In Python, to define a class, the class needs to have the function __init__
with at least one input self
. This function is called when an object is initiated with this class. The internal variable self
defines the properties of the object. The input arguments apart from self
for the function __init__
are the parameters to be passed when initiating a new object.
In a class, additional functions can also be defined, and these will be the methods for the object of this class.
With this implementation of node as a class, the expandAndReturnChildren
function is updated to be
def expandAndReturnChildren(state_space, node):
children = []
for [n,m,c] in state_space:
if n == node.state:
childnode = Node(m, node)
children.append(childnode)
elif m == node.state:
childnode = Node(n, node)
children.append(childnode)
return children
Discussion:
- Discuss the changes to
expandAndReturnChildren
(Specfically: Objects vs. Strings) - in this function, we first create a
childenode
object, then in the next line we append it toChildren
, why not do that in one line? - what about the cost? we seem to ignore so far, why?. can you modify the function to capture the cost as well? (more on that later)
Think about these issues before you continue.
A Quick check
Your current code should look like
initial_state = "Arad"
goal_state = "Bucharest"
state_space = [
['Arad', 'Zerind', 75],
['Arad', 'Timisoara', 118],
['Timisoara', 'Lugoj', 111],
...
]
class Node:
def __init__(self, state=None, parent=None):
self.state = state
self.parent = parent
self.children = []
def addChildren(self, children):
self.children.extend(children)
def expandAndReturnChildren(state_space, node):
children = []
for [n,m,c] in state_space:
if n == node.state:
childnode = Node(m, node)
children.append(childnode)
elif m == node.state:
childnode = Node(n, node)
children.append(childnode)
return children
Separation of Problem Definition vs. Algorithm Definition.
By defining the key elements—namely, the initial state, goal state, state space, and transition model—before implementing the BFS algorithm, we have effectively separated the problem from the algorithm.
This separation offers significant advantages in programming, particularly in terms of flexibility and reusability:
-
The initial state represents the starting point of the search—in this case, the city where the journey begins.
-
The goal state defines the destination or target that the algorithm is trying to reach.
-
The state space outlines all the possible connections between cities, including the cost of traveling from one to another. It serves as the map or environment within which the search operates.
-
The transition model defines how one can move from one state (city) to another. It governs the valid moves or actions available and is typically implemented as a function that returns the next possible states (children) from a given current state.
The Breadth-First Search (BFS) algorithm
Below is the complete code for the BFS Search Algorithm. Please review it first — we’ll then go through it step by step to understand how it works
The BFS Algorithm: The Complete Code
# The BFS Search Algorithm
def bfs(state_space, initial_state, goal_state):
# STEP 1 - Initialization
frontier = [] # the front-line, the CURRENT exploration/test node
explored = [] # list of nodes we already explored
found_goal = False
frontier.append(Node(initial_state, None)) # first node in the frontier is the initial_state
goal_node = Node()
solution = []
path_cost = 0
# STEP 2 - Search for Goal Loop
while not found_goal:
# 2.1 Manage the Frontier & Explored Lists
children = expandAndReturnChildren(state_space, frontier[0])
frontier[0].addChildren(children)
explored.append(frontier[0])
del frontier[0]
# 2.2 Goal Test
for child in children:
# first, check if node was alrady expanded or explored
if not (child.state in [e.state for e in explored]) and not (child.state in [f.state for f in frontier]):
# next, check if node is goal
if child.state == goal_state:
found_goal = True # end algorithm
print("Goal Found!")
goal_node = child # for later processing
frontier.append(child) # to search deeper
# 2.3 Progress output
print("Explored:", [e.state for e in explored])
print("Frontier:", [f.state for f in frontier])
print("Children:", [c.state for c in children])
print("")
# STEP 3 Find Solution path
solution = [goal_node.state] # start from goal node
trace_node = goal_node # use the trace node to trace the path back to the initial node
# 3.1 trace your steps and find the solution path
while trace_node.parent is not None: # are you back to the initial_state ?
solution.insert(0, trace_node.parent.state) # then find the parent and add it to the solution list
trace_node = trace_node.parent # set trace to parent (to go back one level, and repeat)
# 3.2 determine the cose of the solution path.
for i in range(len(solution) - 1):
city1 = solution[i]
city2 = solution[i + 1]
for [from_city, to_city, cost] in state_space:
if (from_city == city1 and to_city == city2) or (from_city == city2 and to_city == city1):
path_cost += cost
break # Exit once the matching pair is found
return solution, path_cost
Now, let's go step by step. As discussed in the class, the BFS algorithm revolves around the use of two key lists:
-
frontier: Nodes (cities) that are discovered but not yet explored
-
explored: Nodes (cities) that have been fully examined
The algorithm systematically moves nodes from the frontier to explored as it traverses the state_space
, continuing until the goal_state
is found. The major steps of the BFS algorithm include the following key steps:
-
Initialization:
Set up the initial variables, including placing the starting city into the frontier. -
The Search Loop:
This step repeatedly explores the search space by expanding nodes, testing for the goal, and updating progress. -
Solution & Cost:
Once the goal is found, this step traces the solution path back toinital_stat
, then calculates the its cost.
Step 0: Function definition and inputs
We will now implement the BFS algorithm as a Python function that takes the state_space
, initial_state
, and goal_state
as arguments (inputs).
Additionally, the BFS funtion utilizes the expandAndReturnChildren
function and the Node
Class defintion and its methods.
def bfs(state_space, initial_state, goal_state):
Step 1: Initialization
def bfs(state_space, initial_state, goal_state):
# STEP 1 - Initialization
frontier = []
explored = []
found_goal = False
frontier.append(Node(initial_state, None))
goal_node = Node()
solution = []
path_cost = 0
In this first step, we set up the key variables that drive the BFS process—defining the frontier
, explored
, and placing the initial state as the starting node. While some variables like goal_node
, solution
, and path_cost
are prepared for later use.
Discussion: - evaluate each variable definition and explain why it is needed - why is goal_node is defined as a blank object and not a String?
Node(initial_state, None)
As the root node has no parent, we use None
as the value of the parent of root node.
Step 2: The Search Loop
After initialization, the algorithm enters the search loop, which is the heart of BFS. In this loop, we systematically explore the nodes in the frontier
until the goal is found. Each cycle of the loop involves three main parts:
- 2.1 Managing the
frontier
andexplored
lists to track which nodes have been discovered and processed, - 2.2 Performing a goal test to check whether we’ve reached the destination, and
- 2.3 Outputting the current progress to help us observe how the search unfolds step by step.
We’ll now break down each of these subsection in more detail
# STEP 2 - The Search Loop
while not found_goal:
# 2.1 Manage the Frontier & Explored Lists
children = expandAndReturnChildren(state_space, frontier[0])
frontier[0].addChildren(children)
explored.append(frontier[0])
del frontier[0]
Discussion: - read each line and understand what is going on
# 2.2 Goal Test
for child in children:
# first, check if node was alrady expanded or explored
if not (child.state in [e.state for e in explored]) and not (child.state in [f.state for f in frontier]):
# next, check if node is goal
if child.state == goal_state:
found_goal = True # end algorithm
print("Goal Found!")
goal_node = child # for later processing
frontier.append(child) # to search deeper
Discussion: - what happens if we dont check for redundancies?
List comprehension
In step 2.2, we use a special Syntax; [e.state for e in explored]
which is referred to as List Comprehension. List comprehension provides a shorter syntax to create a new list.
This example loops through the list variable explored
and create a list with the values of .state
for each element (node) in the explored
list.
This one-liner is equivalent to
explored_nodes = []
for e in explored:
explored_nodes.append(e.state)
# 2.3 Progress output
print("Explored:", [e.state for e in explored])
print("Frontier:", [f.state for f in frontier])
print("Children:", [c.state for c in children])
print("")
Discussion: - Is this step really necessary? what value does it add to your code?
Step 3: Solution & Cost
Now that the algorithm has identified the goal_node
, we can trace the solution through the parent of the goal node all the way back to the root node (node with no parent). Then the function should return the solution of BFS.
# STEP 3 Find Solution path
solution = [goal_node.state]
trace_node = goal_node
# 3.1 trace your steps and find the solution path
while trace_node.parent is not None:
solution.insert(0, trace_node.parent.state)
trace_node = trace_node.parent
Discussion: - Read the code of each step and understand its inner workings - try to understand how the solution path is created in the end - Question: what is the difference between the Solution List, and the other lists we defined earlier (such as Frontier or Explored)?
Next, the final step, is the determine the cost of this solution path. At this point, we finally utilize the cost element that is part of the state_space
as we discussed early on.
# 3.2 determine the cose of the solution path.
for i in range(len(solution) - 1):
city1 = solution[i]
city2 = solution[i + 1]
for [from_city, to_city, cost] in state_space:
if (from_city == city1 and to_city == city2) or (from_city == city2 and to_city == city1):
path_cost += cost
break
Discussion: - Read the code of each step and understand its inner workings - what is += and why is it used here ? - In your opinion, is this an efficient way to calculate path cost? is there a better way?
Running the algorithm
Now we have two functions expandAndReturnChildren
and bfs
, alongside with the variables state_space
, initial_state
, and goal_state
.
To run a script to execute the bfs
function, we can have the script file structured as such:
initial_state = 'Arad'
goal_state = 'Bucharest'
state_space = [
...
]
class Node:
...
def expandAndReturnChildren(...):
...
def bfs(...):
...
print('Solution: ' + str(bfs(state_space, initial_state, goal_state)))
Discussion: - what if we want to run this code to test two other cities ? - for example, what if Nick was in Oradea and wanted to go to Vaslui - how would you modify the code (without redefining inital and goal we set above) to do that?
Task 3: Run and test your BFS Search Algorithm
Preparation for next week (Part II)
Reusing code is one of the most valuable skills a programmer can develop.
A great way to reuse the code for other search algorithms is to encapsulate it into a library or module, which you can then import and call in future Python programs.
In fact, any .py
file can be used to define a Python library or module. However, to prevent code outside the functions from executing when the file is imported as a library (instead of being run as a script), we can add an extra condition check.
class Node:
...
def expandAndReturnChildren(...):
...
def bfs(...):
...
if __name__ == '__main__':
state_space = [
...
]
initial_state = 'Arad'
goal_state = 'Bucharest'
print('Solution: ' + str(bfs(state_space, initial_state, goal_state)))
__name__
is a special variable in Python that evaluates to the name of the current module. This variable has the value of '__main__'
if it's called as the main program rather than a module or library.
This part is essentially the main
function in other programming languages like C++ and Java.
Execute the script and resolve any error.
Prep work for Part II
-
Fully understand the code as you will have to modify the code for other problem/search algorithms in future labs.
-
How would you modify the code to run other uninformed search algorithms such as uniform-cost search, depth-first search, etc.? Which part(s) of the code do you need to modify? Think about it before you work on that in next part of this lab
Bonus Opportunity: Machine Learning Onramp Certificate
To complement your learning in this lab and explore applied machine learning techniques, you are encouraged to complete the Machine Learning Onramp course by MathWorks.
- Estimated Time: ~2 hours
- Platform: Online (browser-based)
- Outcome: Digital Certificate of Completion from MathWorks
Action Steps: 1. Access the course here: Machine Learning Onramp – MathWorks Academy 2. Go through all the lessons and complete the interactive activities. 3. Download your Certificate of Completion. 4. Upload your certificate along with your Lab X submission on LMS.
Bonus Recognition: - Students who complete and submit this certificate will be acknowledged later in the semester. - This is optional and will not impact your Lab X grade.