Lab 9

This is not available yet. It will be released when it is assigned.

Due Date and Links

Lab due on your scheduled lab day
Lab accepted for full credit until Monday, April 6, 11:59 pm Eastern
Direct autograder link https://autograder.io/web/project/3821

In this lab, you will use numpy to implement a few basic machine learning algorithms. You will first design a probabilistic classifier for a game about hidden gnomes, and then you will use linear regression to predict the release year of songs based on a set of audio features.

This lab has three parts:

The GnomesPlayer class in gnomes.py: this is a toy classification problem, where you will predict the location of a set of hiding gnomes.
The Evaluator class in regression.py: this evaluates the predictions made by your linear regression model. You will implement the class itself and two evaluation metrics: accuracy, which measures the fraction of predictions that exactly match the correct year, and mean squared error (MSE), which measures how far off the predictions are from the actual release years.
The LinearRegressor class in regression.py: this is a subset of a real machine learning dataset, on which you will implement linear regression to predict which year a set of songs were released.

NOTE: This work is primarily about computation, not data analysis! Thus, numpy is our tool of choice, not pandas.

Starter Files

You can download the starter files using this link. The starter files include:

gnomes.py
gnomes_public_tests.py
regression.py
regression_public_tests.py
YearPredictionMSD_train.txt
YearPredictionMSD_test.txt

You will submit only your gnomes.py and regression.py files to the autograder.

Tasks to Complete

To complete this lab, you need to do the following steps:

Implement the following GnomesPlayer methods:
- GnomesPlayer.fit()
- GnomesPlayer.predict()
Implement the Evaluator class along with the following Evaluator methods:
- Evaluator.accuracy()
- Evaluator.mean_squared_error()
Implement the following LinearRegressor methods:
- LinearRegressor.fit()
- LinearRegressor.predict()
Submit your code to the autograder

Much of the code for this lab has already been provided to you. You need not and should not change any of the given starter code. You can use gnomes_public_tests.py to test your implementation of the GnomesPlayer methods, and regression_public_tests.py for the LinearRegressor methods.

Part 1: Find the Gnomes (`gnomes.py`)

Part 1 is in the context of a game called Find the Gnomes. In this game, there are 100 “nooks”: little out-of-the-way places. The nooks are numbered from 0 to 99. There are also a certain number gnomes (little mystical creatures). We’ll represent the number of gnomes as \(g\). We don’t know what \(g\) is, but we do know \(2 \le g \le 100\). The gnomes are hiding, and we need to find them. They always hide in a single set of consecutive nooks, one gnome per nook. For example:

There might be 3 gnomes hiding in nooks 34, 35, and 36.
Or there might be 3 gnomes hiding in nooks 65, 66, and 67.
Or there might be 12 gnomes hiding in nooks 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.

The gnomes will always be in a single set of consecutive nooks, not split into multiple groups or with multiple gnomes in a single nook.

We need to figure out where the gnomes are hiding. Thankfully, we have several Gnest security cameras that monitor some of the nooks. For example, our cameras might tell us the following information:

Nook 16 does not have a gnome
Nook 20 has a gnome
Nook 26 does not have a gnome
Nook 18 has a gnome

We can use AI algorithms to figure out where the gnomes are hiding. The table below will help you connect the parts of this game to common AI terminology:

Gnome Game Terminology	AI Terminology
a) We have an *array of nooks* that our cameras have observed, called `observed_nooks`.	We have an *array of inputs*, called `X_train`.
b) We have a *parallel array of outputs* called `has_gnome`; that is, `has_gnome[i]` corresponds to `observed_nooks[i]`. Each value in `has_gnome` with a 1 indicates that a gnome is present at the corresponding nook in the `observed_nooks` array, while a 0 indicates absence.	We have a *parallel array of the outputs* that correspond to our inputs, called `y_train`.
c) Together, `observed_nooks` and `has_gnome` make a *dataset* that we can use to make our *plan for how to reason* about this placement of gnomes. Once we have this plan, we can *make predictions* about other nooks too – not just the ones in the `observed_nooks` array.	Together, `X_train` and `y_train` make up the training set that we can use to make our artificially intelligent reasoning system – our *model. Once the model is trained, it will be able to make predictions* for other inputs too – not just the ones in `X_train`.
d) We want our system to be able to *take a new array* of nook numbers, called `other_nooks`. Assuming `other_nooks` comes from the same gnome placements reflected in `observed_nooks` and `has_gnome`, we want to *create an array of predicted outputs* (0’s and 1’s), one for each nook listed in `other_nooks`.	We want our system to be able to *take a new array* of inputs, called `X_test`. Assuming `X_test` comes from the same data distribution reflected in `X_train` and `y_train`, we want our system to *create an array of predicted outputs*, one for each input in `X_test`.

We’re following standard practice in machine-learning terminology by capitalizing any occurrence of X in a name, but not y. This practice comes from the fact that usually X is in a more complex form (a matrix), and so we follow the convention here even though the reasoning behind it doesn’t apply perfectly to part 1 of this lab.

Continuing with our toy example, we have so far:

Nook 16 does not have a gnome
Nook 20 has a gnome
Nook 26 does not have a gnome
Nook 18 has a gnome

observed_nooks and has_gnome would look like this (lined up for convenience):

observed_nooks = [16, 20, 26, 18]
     has_gnome = [ 0,  1,  0,  1]

where 0 means that a nook does not have a gnome, and 1 means that a nook does have a gnome. So according to this dataset, nook 16 does not have a gnome, nook 20 does have a gnome, nook 26 does not have a gnome, and nook 18 does have a gnome.

Our classification algorithm will be as follows:

During the training phase (you implement this in GnomesPlayer.fit()), the player finds the lowest and highest nook indices in the training data that contain a gnome. These get stored as attributes of the GnomesPlayer class (lowest_gnome_nook and highest_gnome_nook). In the example above, lowest_gnome_nook would be 18 and highest_gnome_nook would be 20.
During the testing phase (you implement this in GnomesPlayer.predict()), you will predict whether the nooks in the testing data have gnomes or not, as follows:
- Consider the nooks in the testing data that are between the lowest_gnome_nook and highest_gnome_nook that we learned in the training phase.
- Because gnomes are always hiding in contiguous nooks, we know that any nooks between these two nook numbers are guaranteed to have gnomes. With the example above, we would predict that Nook 19 does have a gnome, because 19 is within the interval [18, 20].
- For simplicity, we predict that any nook outside of this nook range does not have a gnome. For example, in the scenario above we would predict that Nook 6 and Nook 25 do not have gnomes, because 6 and 25 are outside the interval [18, 20].

Given this algorithm, implement GnomesPlayer.fit() and GnomesPlayer.predict() according to their respective RMEs.

Use exclusively vectorized / functional techniques! No explicit loops! Both fit and predict can be implemented in 3 or fewer lines of code, without loops.

The main() function in gnomes.py already sets up two example games to help you visualize and debug your implementation. You should not need to change main(). There are further test cases in gnomes_public_tests.py to enable you to verify the correctness of your implementation.

Part 2: Song Year Prediction (`regression.py`)

In AI, a model is a system trained to recognize patterns in training data so that it can make predictions on testing data (and eventually, real-world data in which the correct anwers are not known). In the gnome game, we fit the model by finding the lowest_gnome_nook and highest_gnome_nook from the training data. These two variables were the results that came from the fit process. We call such variables the parameters of the model. That is, the result of fit is the setting of parameters that can then be used in predict.

In the gnome game, the fit process we used (find min and max) was specific to the gnome game. But there are algorithms for fitting a model (learning parameters) that can be applied more generally. Linear regression is one such algorithm. In part 2 of this lab, you will implement linear regression to make predictions about a dataset.

We’ll use a subset of the YearPredictionMSD dataset, which contains audio features of songs and their release years. The task is to predict the release year of a song based on its audio features. The dataset was originally published in 2011 by Thierry Bertin-Mahieux and is available on the UCI Machine Learning Repository.¹

We provide the training dataset in YearPredictionMSD_train.txt with 10000 data points and the YearPredictionMSD_test.txt with 400 data points. The full dataset has 463,715 training examples and 51,630 test examples, but to ensure that regression.py can run in a timely manner on all student laptops, we are using a subset of the data.

Each row in these files has the following format:

first column: the release year. For each row, this is the ground truth, the correct answer that we are trying to predict. It represents the correct outputs.
remaining columns: These are various numeric audio features, such as timbre. These represent the inputs – the descriptions of songs. The details of these columns don’t matter for our work here, but if you’re curious, you can read more about this dataset.

Before we go into the linear regression algorithm, we will first introduce some variables and terminology:

\(X\): represents the inputs: in fit() it represents the training dataset inputs, and in predict() it represents the testing dataset inputs.
\(y\) represents the ground truth, the correct answers corresponding to \(X\)
\(\hat{y}\): represents our model’s predictions on the testing dataset - what it thinks are the correct answers.
\(w\): represents the parameters to be learned through linear regression. The “w” refers to weights, a name for the parameters in linear regression.

Evaluator Class

In this section, you will implement the Evaluator class from scratch in regression.py. You will define a class, write an __init__ method, handle the self parameter, and implement the logic for the class.

Once you have defined the class and its __init__ method, you can simply copy and paste the function headers and RMEs provided in the starter code comments into your class. Just be sure to indent them correctly so they are recognized as methods of the Evaluator class.

After you have created your class, you will implement the logic for the accuracy() and mean_squared_error() methods.

Evaluator: accuracy

One common way to evaluate a model is to see how accurate its predictions are. In our case, this means checking whether the release year that our model predicts is the same as the actual release year of the song.

Intuition for Implementation: Accuracy tells us what fraction of our predictions were “perfect.” To calculate this, you need to find how many times your predicted year matches the actual year exactly, relative to the total number of songs.

For example, suppose the actual release years are:

y_true = [2001, 2003, 2005, 2007]

and your model predicts:

y_pred = [2001, 2002, 2005, 2000]

Your model made 2 exact matches (2001 and 2005) out of 4 predictions, so the accuracy would be:

accuracy = 2 / 4 = 0.5

Hint: If you compare two arrays using ==, NumPy creates an array of True and False values. Since NumPy treats True as 1 and False as 0, you can use np.mean() on that comparison to calculate the fraction of correct matches automatically.

Note: Make sure the value returned from accuracy() is a float.

However, this doesn’t give us the full picture of how well our model works. If our test dataset has 10 songs, and the model is wrong for all 10 predictions, but is only one year off of the correct release year, we would report an accuracy of 0%. We would report the same accuracy of 0% if our prediction is 10 years off of the correct release year as well. This should feel strange: a model that is consistently off by 1-2 years is better than a model that is consistently off by 10-12 years, but both get the same accuracy.

Evaluator: mean_squared_error

To address this and help us quantify “how wrong” our predictions are, we can use Mean Squared Error (MSE).

The equation for this is:

\[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

Essentially, for every testing data point, we find the difference between the predicted and actual values, square it to ensure that the value is positive, and then average this value across all data points.

To calculate MSE in your mean_squared_error() function, follow these three steps as shown in the formula:

Find the Error: Subtract y_pred from y_true to get the error.
Square the Error: Square those differences (using ** 2).
Find the Mean: Take the mean of the squared errors using np.mean().

We then take the square root of MSE to get Root Mean Square Error (RMSE), which tells us on average how much our predictions differ from the actual values.

Note: Make sure the result is returned as a float.

For the entire dataset, we get a MSE of 90.787 and an RMSE of 9.528 with our solution. This means that on average, our models predictions for the release year differ from the actual release years by 9.528 years. For the subset of the dataset that you are using, we get a MSE of 53.300 and an RMSE of 7.301.

LinearRegressor: fit

While a linear regression model can be fit to data using a gradient descent approach, here we will instead use a formula-based approach. The vectorized formula to find the parameters is:

\[w = \left(X^T X \right)^{-1} \left(X^T y\right)\]

We do not require nor expect understanding of this formula for this course. We’ll walk through the parts that you need and let numpy do a lot of the actual calculations.

Note:

Since we’re talking about fit, \(X\) represents the training dataset inputs.
In the context of this discussion, we can think of both \(X\) and \(y\) as matrices, represented as numpy arrays.
The superscript \(T\) represents an operator called the matrix transpose. In numpy, the transpose of a matrix A can be calculated with A.T.
The superscript \(-1\) represents an operator called the matrix inverse. In numpy, the inverse of a matrix A can be calculated with np.linalg.inv(A).
Two items next to each other indicates matrix multiplication (similarly to multiplication notation in ordinary algebra). In numpy, matrix multiplication can be done with the @ operator. For example, if A and B are matrices, then A @ B represents their product.

Use the above formula to implement LinearRegressor.fit(). Make sure you set the attribute w accordingly, so that it can be used in predict.

LinearRegressor: predict

In order to make predictions, we can use the following equation:

\(\hat{y}\) = \(Xw\)

Note:

Since we’re talking about predict, \(X\) represents the testing dataset inputs.

Use the above formula to implement LinearRegressor.predict().

What `main()` does in `regression.py`

You do not need to implement main() in regression.py, but it is helpful to understand what it does:

Loads the training data from YearPredictionMSD_train.txt and the testing data from YearPredictionMSD_test.txt into numpy arrays.
Splits each dataset into y_train/y_test (the groud-truth year column) and X_train/X_test (the input feature columns)
calls add_intercept(X) to add a column of ones as the first column in X_train and X_test. This is common practice in linear regression.
Creates an object of the LinearRegressor class
Calls fit(X_train, y_train) to calculate the weights for linear regression.
Calls predict(X_test) to get predicted years for the test dataset.
Computes evaluation metrics as described above.
Prints a few sample (predicted year, actual year) pairs so you can view the model’s predictions.

Running Your Code

You can run gnomes.py directly to see how your predictions worked on the sample games.
You can run regression.py directly to evaluate how well your linear regressor worked on the YearPredictionMSD dataset. Make sure the YearPredictionMSD_train.txt and YearPredictionMSD_test.txt files are in the Lab 9 directory.

NOTE: if you see a FileNotFound error while trying to run any of your code, follow the steps on this webpage to troubleshoot.

Expected output for gnomes.py:

========== Game #1 ==========

--- GnomesPlayer ---
Before training: GnomesPlayer[untrained]
After training: GnomesPlayer[13, 18]
predictions: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Actual, predicted
  17: 1, 1
  12: 1, 0
  27: 0, 0
  22: 1, 0
  5: 0, 0
  24: 0, 0
  21: 1, 0
  4: 0, 0
  9: 0, 0
  6: 0, 0
Right: 7, Wrong: 3
Accuracy (higher is better): 0.7

========== Game #2 ==========

--- GnomesPlayer ---
Before training: GnomesPlayer[untrained]
After training: GnomesPlayer[4, 19]
predictions: [1, 0, 0, 1, 0, 1, 1, 1]
Actual, predicted
  15: 1, 1
  23: 0, 0
  22: 0, 0
  5: 1, 1
  20: 1, 0
  12: 1, 1
  14: 1, 1
  17: 1, 1
Right: 7, Wrong: 1
Accuracy (higher is better): 0.875

Expected output for regression.py:

=== Loading dataset into numpy array ===

=== Linear Regression: predict release year ===
Test MSE:  53.300
Test RMSE: 7.301
Test accuracy: 7.25%

Sample predictions (predicted year, actual year):
  1995, 2007
  2002, 2003
  2001, 2005
  2002, 2003
  2003, 2005
  1999, 2007
  2002, 2003
  2001, 2003
  2000, 2003
  2002, 2005

Testing Your Code

We have provided public tests for GnomePlayer.fit() and GnomePlayer.predict() in gnomes_public_tests.py. We encourage you to run these test files periodically to check the correctness of your implementations. We recommend running tests after you write each function and debugging when needed along the way, rather than writing everything and running the tests only at the end.

How to Submit

When ready, submit to the autograder. You will submit your gnomes.py and regression.py files only.

IMPORTANT: For all labs in EECS 183, to receive a grade, every student must individually submit the Lab Submission. Late submissions for Labs will not be accepted for credit. For this lab, you will receive ten submissions per day with feedback.

Once you receive a grade of 10 of 10 points from the autograder you will have received full credit for this lab.

¹ Bertin-Mahieux, T. (2011). Year Prediction MSD [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50K61.

Copyright and Academic Integrity

Materials for this assignment were developed with assistance from course staff, including Krithika Venkatasubramanian and Leanne Cheng.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All materials provided for this course, including but not limited to labs, projects, notes, and starter code, are the copyrighted intellectual property of the author(s) listed in the copyright notice above. While these materials are licensed for public non-commercial use, this license does not grant you permission to post or republish your solutions to these assignments.

It is strictly prohibited to post, share, or otherwise distribute solution code (in part or in full) in any manner or on any platform, public or private, where it may be accessed by anyone other than the course staff. This includes, but is not limited to:

Public-facing websites (like a personal blog or public GitHub repo).
Solution-sharing websites (like Chegg or Course Hero).
Private collections, archives, or repositories (such as student group “test banks,” club wikis, or shared Google Drives).
Group messaging platforms (like Discord or Slack).

To do so is a violation of the university’s academic integrity policy and will be treated as such.

Asking questions by posting small code snippets to our private course discussion forum is not a violation of this policy.