Lab 9
Due Date and Links
-
Lab due on your scheduled lab day
-
Lab accepted for full credit until Monday, April 6, 11:59 pm Eastern
-
Direct autograder link https://autograder.io/web/project/3821
In this lab, you will use numpy to implement a few basic machine learning algorithms. You will first design a probabilistic classifier for a game about hidden gnomes, and then you will use linear regression to predict the release year of songs based on a set of audio features.
This lab has three parts:
- The
GnomesPlayerclass ingnomes.py: this is a toy classification problem, where you will predict the location of a set of hiding gnomes. - The
Evaluatorclass inregression.py: this evaluates the predictions made by your linear regression model. You will implement the class itself and two evaluation metrics: accuracy, which measures the fraction of predictions that exactly match the correct year, and mean squared error (MSE), which measures how far off the predictions are from the actual release years. - The
LinearRegressorclass inregression.py: this is a subset of a real machine learning dataset, on which you will implement linear regression to predict which year a set of songs were released.
NOTE: This work is primarily about computation, not data analysis! Thus,
numpyis our tool of choice, notpandas.
Starter Files
You can download the starter files using this link. The starter files include:
gnomes.pygnomes_public_tests.pyregression.pyregression_public_tests.pyYearPredictionMSD_train.txtYearPredictionMSD_test.txt
You will submit only your gnomes.py and regression.py files to the autograder.
Tasks to Complete
To complete this lab, you need to do the following steps:
- Implement the following
GnomesPlayermethods:GnomesPlayer.fit()GnomesPlayer.predict()
- Implement the
Evaluatorclass along with the followingEvaluatormethods:Evaluator.accuracy()Evaluator.mean_squared_error()
- Implement the following
LinearRegressormethods:LinearRegressor.fit()LinearRegressor.predict()
- Submit your code to the autograder
Much of the code for this lab has already been provided to you. You need not and should not change any of the given starter code. You can use gnomes_public_tests.py to test your implementation of the GnomesPlayer methods, and regression_public_tests.py for the LinearRegressor methods.
Part 1: Find the Gnomes (gnomes.py)
Part 1 is in the context of a game called Find the Gnomes. In this game, there are 100 “nooks”: little out-of-the-way places. The nooks are numbered from 0 to 99. There are also a certain number gnomes (little mystical creatures). We’ll represent the number of gnomes as \(g\). We don’t know what \(g\) is, but we do know \(2 \le g \le 100\). The gnomes are hiding, and we need to find them. They always hide in a single set of consecutive nooks, one gnome per nook. For example:
- There might be 3 gnomes hiding in nooks 34, 35, and 36.
- Or there might be 3 gnomes hiding in nooks 65, 66, and 67.
- Or there might be 12 gnomes hiding in nooks 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.
The gnomes will always be in a single set of consecutive nooks, not split into multiple groups or with multiple gnomes in a single nook.
We need to figure out where the gnomes are hiding. Thankfully, we have several Gnest security cameras that monitor some of the nooks. For example, our cameras might tell us the following information:
- Nook 16 does not have a gnome
- Nook 20 has a gnome
- Nook 26 does not have a gnome
- Nook 18 has a gnome
We can use AI algorithms to figure out where the gnomes are hiding. The table below will help you connect the parts of this game to common AI terminology:
| Gnome Game Terminology | AI Terminology |
|---|---|
a) We have an array of nooks that our cameras have observed, called observed_nooks. |
We have an array of inputs, called X_train. |
b) We have a parallel array of outputs called has_gnome; that is, has_gnome[i] corresponds to observed_nooks[i]. Each value in has_gnome with a 1 indicates that a gnome is present at the corresponding nook in the observed_nooks array, while a 0 indicates absence. |
We have a parallel array of the outputs that correspond to our inputs, called y_train. |
c) Together, observed_nooks and has_gnome make a dataset that we can use to make our plan for how to reason about this placement of gnomes. Once we have this plan, we can make predictions about other nooks too – not just the ones in the observed_nooks array. |
Together, X_train and y_train make up the training set that we can use to make our artificially intelligent reasoning system – our model. Once the model is trained, it will be able to make predictions for other inputs too – not just the ones in X_train. |
d) We want our system to be able to take a new array of nook numbers, called other_nooks. Assuming other_nooks comes from the same gnome placements reflected in observed_nooks and has_gnome, we want to create an array of predicted outputs (0’s and 1’s), one for each nook listed in other_nooks. |
We want our system to be able to take a new array of inputs, called X_test. Assuming X_test comes from the same data distribution reflected in X_train and y_train, we want our system to create an array of predicted outputs, one for each input in X_test. |
We’re following standard practice in machine-learning terminology by capitalizing any occurrence of X in a name, but not y. This practice comes from the fact that usually X is in a more complex form (a matrix), and so we follow the convention here even though the reasoning behind it doesn’t apply perfectly to part 1 of this lab.
Continuing with our toy example, we have so far:
- Nook 16 does not have a gnome
- Nook 20 has a gnome
- Nook 26 does not have a gnome
- Nook 18 has a gnome
observed_nooks and has_gnome would look like this (lined up for convenience):
observed_nooks = [16, 20, 26, 18]
has_gnome = [ 0, 1, 0, 1]
where 0 means that a nook does not have a gnome, and 1 means that a nook does have a gnome. So according to this dataset, nook 16 does not have a gnome, nook 20 does have a gnome, nook 26 does not have a gnome, and nook 18 does have a gnome.
Our classification algorithm will be as follows:
- During the training phase (you implement this in
GnomesPlayer.fit()), the player finds the lowest and highest nook indices in the training data that contain a gnome. These get stored as attributes of theGnomesPlayerclass (lowest_gnome_nookandhighest_gnome_nook). In the example above,lowest_gnome_nookwould be18andhighest_gnome_nookwould be20. - During the testing phase (you implement this in
GnomesPlayer.predict()), you will predict whether the nooks in the testing data have gnomes or not, as follows:- Consider the nooks in the testing data that are between the
lowest_gnome_nookandhighest_gnome_nookthat we learned in the training phase. - Because gnomes are always hiding in contiguous nooks, we know that any nooks between these two nook numbers are guaranteed to have gnomes. With the example above, we would predict that Nook 19 does have a gnome, because 19 is within the interval [18, 20].
- For simplicity, we predict that any nook outside of this nook range does not have a gnome. For example, in the scenario above we would predict that Nook 6 and Nook 25 do not have gnomes, because 6 and 25 are outside the interval [18, 20].
- Consider the nooks in the testing data that are between the
Given this algorithm, implement GnomesPlayer.fit() and GnomesPlayer.predict() according to their respective RMEs.
Use exclusively vectorized / functional techniques! No explicit loops! Both fit and predict can be implemented in 3 or fewer lines of code, without loops.
The main() function in gnomes.py already sets up two example games to help you visualize and debug your implementation. You should not need to change main(). There are further test cases in gnomes_public_tests.py to enable you to verify the correctness of your implementation.
Part 2: Song Year Prediction (regression.py)
In AI, a model is a system trained to recognize patterns in training data so that it can make predictions on testing data (and eventually, real-world data in which the correct anwers are not known). In the gnome game, we fit the model by finding the lowest_gnome_nook and highest_gnome_nook from the training data. These two variables were the results that came from the fit process. We call such variables the parameters of the model. That is, the result of fit is the setting of parameters that can then be used in predict.
In the gnome game, the fit process we used (find min and max) was specific to the gnome game. But there are algorithms for fitting a model (learning parameters) that can be applied more generally. Linear regression is one such algorithm. In part 2 of this lab, you will implement linear regression to make predictions about a dataset.
We’ll use a subset of the YearPredictionMSD dataset, which contains audio features of songs and their release years. The task is to predict the release year of a song based on its audio features. The dataset was originally published in 2011 by Thierry Bertin-Mahieux and is available on the UCI Machine Learning Repository.1
We provide the training dataset in YearPredictionMSD_train.txt with 10000 data points and the YearPredictionMSD_test.txt with 400 data points. The full dataset has 463,715 training examples and 51,630 test examples, but to ensure that regression.py can run in a timely manner on all student laptops, we are using a subset of the data.
Each row in these files has the following format:
- first column: the release year. For each row, this is the ground truth, the correct answer that we are trying to predict. It represents the correct outputs.
- remaining columns: These are various numeric audio features, such as timbre. These represent the inputs – the descriptions of songs. The details of these columns don’t matter for our work here, but if you’re curious, you can read more about this dataset.
Before we go into the linear regression algorithm, we will first introduce some variables and terminology:
- \(X\): represents the inputs: in
fit()it represents the training dataset inputs, and inpredict()it represents the testing dataset inputs. - \(y\) represents the ground truth, the correct answers corresponding to \(X\)
- \(\hat{y}\): represents our model’s predictions on the testing dataset - what it thinks are the correct answers.
- \(w\): represents the parameters to be learned through linear regression. The “w” refers to weights, a name for the parameters in linear regression.
Evaluator Class
In this section, you will implement the Evaluator class from scratch in regression.py. You will define a class, write an __init__ method, handle the self parameter, and implement the logic for the class.
Once you have defined the class and its __init__ method, you can simply copy and paste the function headers and RMEs provided in the starter code comments into your class. Just be sure to indent them correctly so they are recognized as methods of the Evaluator class.
After you have created your class, you will implement the logic for the accuracy() and mean_squared_error() methods.
Evaluator: accuracy
One common way to evaluate a model is to see how accurate its predictions are. In our case, this means checking whether the release year that our model predicts is the same as the actual release year of the song.
Intuition for Implementation: Accuracy tells us what fraction of our predictions were “perfect.” To calculate this, you need to find how many times your predicted year matches the actual year exactly, relative to the total number of songs.
For example, suppose the actual release years are:
y_true = [2001, 2003, 2005, 2007]
and your model predicts:
y_pred = [2001, 2002, 2005, 2000]
Your model made 2 exact matches (2001 and 2005) out of 4 predictions, so the accuracy would be:
accuracy = 2 / 4 = 0.5
Hint: If you compare two arrays using ==, NumPy creates an array of True and False values. Since NumPy treats True as 1 and False as 0, you can use np.mean() on that comparison to calculate the fraction of correct matches automatically.
Note: Make sure the value returned from
accuracy()is afloat.
However, this doesn’t give us the full picture of how well our model works. If our test dataset has 10 songs, and the model is wrong for all 10 predictions, but is only one year off of the correct release year, we would report an accuracy of 0%. We would report the same accuracy of 0% if our prediction is 10 years off of the correct release year as well. This should feel strange: a model that is consistently off by 1-2 years is better than a model that is consistently off by 10-12 years, but both get the same accuracy.
Evaluator: mean_squared_error
To address this and help us quantify “how wrong” our predictions are, we can use Mean Squared Error (MSE).
The equation for this is:
\[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]Essentially, for every testing data point, we find the difference between the predicted and actual values, square it to ensure that the value is positive, and then average this value across all data points.
To calculate MSE in your mean_squared_error() function, follow these three steps as shown in the formula:
- Find the Error: Subtract
y_predfromy_trueto get the error. - Square the Error: Square those differences (using
** 2). - Find the Mean: Take the mean of the squared errors using
np.mean().
We then take the square root of MSE to get Root Mean Square Error (RMSE), which tells us on average how much our predictions differ from the actual values.
Note: Make sure the result is returned as a
float.
For the entire dataset, we get a MSE of 90.787 and an RMSE of 9.528 with our solution. This means that on average, our models predictions for the release year differ from the actual release years by 9.528 years. For the subset of the dataset that you are using, we get a MSE of 53.300 and an RMSE of 7.301.
LinearRegressor: fit
While a linear regression model can be fit to data using a gradient descent approach, here we will instead use a formula-based approach. The vectorized formula to find the parameters is:
\[w = \left(X^T X \right)^{-1} \left(X^T y\right)\]We do not require nor expect understanding of this formula for this course. We’ll walk through the parts that you need and let numpy do a lot of the actual calculations.
Note:
- Since we’re talking about
fit, \(X\) represents the training dataset inputs. - In the context of this discussion, we can think of both \(X\) and \(y\) as matrices, represented as numpy arrays.
- The superscript \(T\) represents an operator called the matrix transpose. In numpy, the transpose of a matrix
Acan be calculated withA.T. - The superscript \(-1\) represents an operator called the matrix inverse. In numpy, the inverse of a matrix
Acan be calculated withnp.linalg.inv(A). - Two items next to each other indicates matrix multiplication (similarly to multiplication notation in ordinary algebra). In numpy, matrix multiplication can be done with the
@operator. For example, ifAandBare matrices, thenA @ Brepresents their product.
Use the above formula to implement LinearRegressor.fit(). Make sure you set the attribute w accordingly, so that it can be used in predict.
LinearRegressor: predict
In order to make predictions, we can use the following equation:
\(\hat{y}\) = \(Xw\)
Note:
- Since we’re talking about
predict, \(X\) represents the testing dataset inputs.
Use the above formula to implement LinearRegressor.predict().
What main() does in regression.py
You do not need to implement main() in regression.py, but it is helpful to understand what it does:
- Loads the training data from
YearPredictionMSD_train.txtand the testing data fromYearPredictionMSD_test.txtinto numpy arrays. - Splits each dataset into
y_train/y_test(the groud-truth year column) andX_train/X_test(the input feature columns) - calls
add_intercept(X)to add a column of ones as the first column inX_trainandX_test. This is common practice in linear regression. - Creates an object of the
LinearRegressorclass - Calls
fit(X_train, y_train)to calculate the weights for linear regression. - Calls
predict(X_test)to get predicted years for the test dataset. - Computes evaluation metrics as described above.
- Prints a few sample
(predicted year, actual year)pairs so you can view the model’s predictions.
Running Your Code
- You can run
gnomes.pydirectly to see how your predictions worked on the sample games. - You can run
regression.pydirectly to evaluate how well your linear regressor worked on the YearPredictionMSD dataset. Make sure theYearPredictionMSD_train.txtandYearPredictionMSD_test.txtfiles are in the Lab 9 directory.
NOTE: if you see a
FileNotFounderror while trying to run any of your code, follow the steps on this webpage to troubleshoot.
Expected output for gnomes.py:
========== Game #1 ========== --- GnomesPlayer --- Before training: GnomesPlayer[untrained] After training: GnomesPlayer[13, 18] predictions: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] Actual, predicted 17: 1, 1 12: 1, 0 27: 0, 0 22: 1, 0 5: 0, 0 24: 0, 0 21: 1, 0 4: 0, 0 9: 0, 0 6: 0, 0 Right: 7, Wrong: 3 Accuracy (higher is better): 0.7 ========== Game #2 ========== --- GnomesPlayer --- Before training: GnomesPlayer[untrained] After training: GnomesPlayer[4, 19] predictions: [1, 0, 0, 1, 0, 1, 1, 1] Actual, predicted 15: 1, 1 23: 0, 0 22: 0, 0 5: 1, 1 20: 1, 0 12: 1, 1 14: 1, 1 17: 1, 1 Right: 7, Wrong: 1 Accuracy (higher is better): 0.875
Expected output for regression.py:
=== Loading dataset into numpy array === === Linear Regression: predict release year === Test MSE: 53.300 Test RMSE: 7.301 Test accuracy: 7.25% Sample predictions (predicted year, actual year): 1995, 2007 2002, 2003 2001, 2005 2002, 2003 2003, 2005 1999, 2007 2002, 2003 2001, 2003 2000, 2003 2002, 2005
Testing Your Code
We have provided public tests for GnomePlayer.fit() and GnomePlayer.predict() in gnomes_public_tests.py. We encourage you to run these test files periodically to check the correctness of your implementations. We recommend running tests after you write each function and debugging when needed along the way, rather than writing everything and running the tests only at the end.
How to Submit
- When ready, submit to the autograder.
You will submit your
gnomes.pyandregression.pyfiles only.
IMPORTANT: For all labs in EECS 183, to receive a grade, every student must individually submit the Lab Submission. Late submissions for Labs will not be accepted for credit. For this lab, you will receive ten submissions per day with feedback.
- Once you receive a grade of 10 of 10 points from the autograder you will have received full credit for this lab.
1 Bertin-Mahieux, T. (2011). Year Prediction MSD [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50K61.
Copyright and Academic Integrity
© 2026 Steven Bogaerts.
Materials for this assignment were developed with assistance from course staff, including Krithika Venkatasubramanian and Leanne Cheng.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
All materials provided for this course, including but not limited to labs, projects, notes, and starter code, are the copyrighted intellectual property of the author(s) listed in the copyright notice above. While these materials are licensed for public non-commercial use, this license does not grant you permission to post or republish your solutions to these assignments.
It is strictly prohibited to post, share, or otherwise distribute solution code (in part or in full) in any manner or on any platform, public or private, where it may be accessed by anyone other than the course staff. This includes, but is not limited to:
- Public-facing websites (like a personal blog or public GitHub repo).
- Solution-sharing websites (like Chegg or Course Hero).
- Private collections, archives, or repositories (such as student group “test banks,” club wikis, or shared Google Drives).
- Group messaging platforms (like Discord or Slack).
To do so is a violation of the university’s academic integrity policy and will be treated as such.
Asking questions by posting small code snippets to our private course discussion forum is not a violation of this policy.