Lab 4 - Game Recommender

This is not available yet. It will be released when it is assigned.

Due Date and Links

Lab due on your scheduled lab day
Lab accepted for full credit until Monday, February 9, 11:59 pm Eastern
Direct autograder link https://autograder.io/web/project/3784

In this lab, you will write functions to complete the implementation of a game recommender. By completing this lab assignment, you will learn to:

Implement functions in Python
Parse and use information from CSV files
Create dictionaries, sets, and lists
Use for loops to process a list
Append items to a list

Requirements

You should complete this lab in small groups of about 4 students.

For all labs in EECS 183, every student must individually submit the lab materials to receive a grade.

Starter Files

Download the starter files.
Take the games.py, games.csv, and public_tests.py files you got from starter_files.zip and move them into your EECS_183, labs, lab_4 folder.
Open VS Code in your EECS_183 folder.
You will be doing all your coding in games.py, and this is the only file to be submitted to the autograder. You’ll use public_tests.py to test your code, while games.csv contains the data about video games that your code will read.

Warm Up

Let’s start with a function to get the year (as an int) from a date string in the format YYYY-MM-DD. Write get_year according to the RME. Use the provided test_get_year function in public_tests.py to test your code.

Game Information

In this lab, we’ll be working with data about video games. The data is stored in a CSV (Comma Separated Values) file called games.csv. Before we deal with the CSV file itself, let’s see how to capture data about a single game.

Suppose you have game information in a big string like this:

popww_data = "13500,Prince of Persia: Warrior Within,2008-11-21,true,false,false,Very Positive,84,2199,9.99,9.99,0.0,true"

The variable name popww_data is chosen because this is the data for the game “Prince of Persia: Warrior Within”. In general we won’t be able to pick variable names for specific games like this, but it’s useful for this example.

You can see above that each value is separated by a comma. We can use the split method to make this a list of strings:

popww_data = popww_data.split(",")
# popww_data is now: ['13500', 'Prince of Persia: Warrior Within', '2008-11-21', 'true', 'false', 'false', 'Very Positive', '84', '2199', '9.99', '9.99', '0.0', 'true']

To make sense of the above structure, we need to know what is represented by each value in the list. Here is the complete mapping of each index to its meaning:

Index	Meaning
0	app_id
1	title
2	date_release
3	win
4	mac
5	linux
6	rating
7	positive_ratio
8	user_reviews
9	price_final
10	price_original
11	discount
12	steam_deck

Some of these we won’t care about, but let’s define constants for the ones we will use. At the top of games.py, you should see the following constants defined for you:

TITLE_INDEX = 1
DATE_INDEX = 2
WIN_INDEX = 3
MAC_INDEX = 4
RATING_INDEX = 6
PRICE_FINAL_INDEX = 9

With the above, we can access individual pieces of information about the game without using magic numbers. For example, to print the title of the game:

print(popww_data[TITLE_INDEX])

Often, however, it’s often more convenient to put this data into a dictionary. Let’s also change the type from str when appropriate. For example:

popww_dict = {}
popww_dict["date_release"] = popww_data[DATE_INDEX]
popww_dict["win"] = popww_data[WIN_INDEX].lower() == "true"
popww_dict["mac"] = popww_data[MAC_INDEX].lower() == "true"
popww_dict["rating"] = popww_data[RATING_INDEX]
popww_dict["price_final"] = float(popww_data[PRICE_FINAL_INDEX])

The above code would create a dictionary popww_dict that looks like this:

{
    "date_release": "2008-11-21",
    "win": True,
    "mac": False,
    "rating": "Very Positive",
    "price_final": 9.99
}

In the code above, the conversion from str to bool (for "win" and "mac") is a little tricky. There is a bool function for type casting, but it returns True for any non-empty string:

print(bool("true"))   # prints: True
print(bool("false"))  # prints: True
print(bool(""))       # prints: False

Thus, instead, we need to convert the string to lowercase (by calling lower) and then compare that resulting value to "true".

For example, suppose that popww_data[WIN_INDEX] is "True". Then popww_data[WIN_INDEX].lower() is "true", and the comparison "true" == "true" evaluates to the bool value True.
If instead popww_data[WIN_INDEX] is "False", then popww_data[WIN_INDEX].lower() is "false", and the comparison "false" == "true" evaluates to False.

Thus, this bit of code converts a string representation of a boolean to an actual boolean.

Supposed we did the same for a second game:

brink_data = "22364,BRINK: Agents of Change,2011-08-03,true,false,false,Positive,85,21,2.99,2.99,0.0,true"
brink_data = brink_data.split(",")
brink_dict = {}
brink_dict["date_release"] = brink_data[DATE_INDEX]
brink_dict["win"] = brink_data[WIN_INDEX].lower() == "true"
brink_dict["mac"] = brink_data[MAC_INDEX].lower() == "true"
brink_dict["rating"] = brink_data[RATING_INDEX]
brink_dict["price_final"] = float(brink_data[PRICE_FINAL_INDEX])

The resulting brink_dict would look like this:

{
    "date_release": "2011-08-03",
    "win": True,
    "mac": False,
    "rating": "Positive",
    "price_final": 2.99
}

We could then put both of these dictionaries into a larger dictionary that maps game titles to their information:

games_dict = {}
games_dict["Prince of Persia: Warrior Within"] = popww_dict
games_dict["BRINK: Agents of Change"] = brink_dict

The resulting games_dict would look like this:

{
    "Prince of Persia: Warrior Within": {
        "date_release": "2008-11-21",
        "win": True,
        "mac": False,
        "rating": "Very Positive",
        "price_final": 9.99
    },
    "BRINK: Agents of Change": {
        "date_release": "2011-08-03",
        "win": True,
        "mac": False,
        "rating": "Positive",
        "price_final": 2.99
    }
}

To summarize, games_dict is a dictionary that maps game title to a dictionary of information about that game. The dictionary of information about that game maps attribute names (“date_release”, “win”, “mac”, “rating”, and “price_final”) to their corresponding values.

What is a CSV file?

Let’s see how we can build a structure like games_dict from a file containing data about tens of thousands of games.

The file of game data called games.csv essentially stores the data as a table. If you’ve used Excel or Google Sheets, for example, you may be familiar with spreadsheets, which are a common way to represent tabular data. Here’s an example that I’ve opened in Excel (with later rows omitted for brevity):

Spreadsheet Example

The spreadsheet in the image above contains data about video games. Each row corresponds to a single game. Each column corresponds to some attribute about that game. The first row provides the column names.

The data format for Excel and Google Sheets files is proprietary, meaning we can’t easily read it in Python. Thus, most developers use a plain text file format called CSV (Comma Separated Values) to represent tabular data. In fact, the games.csv file you already downloaded in the starter files contains the data from the image above.

If you open games.csv in VS Code or any other text editor, you’ll see something like this (again with many rows omitted for brevity):

app_id,title,date_release,win,mac,linux,rating,positive_ratio,user_reviews,price_final,price_original,discount,steam_deck
13500,Prince of Persia: Warrior Within,2008-11-21,true,false,false,Very Positive,84,2199,9.99,9.99,0.0,true
22364,BRINK: Agents of Change,2011-08-03,true,false,false,Positive,85,21,2.99,2.99,0.0,true
113020,Monaco: What's Yours Is Mine,2013-04-24,true,true,true,Very Positive,92,3722,14.99,14.99,0.0,true
226560,Escape Dead Island,2014-11-18,true,false,false,Mixed,61,873,14.99,14.99,0.0,true
249050,Dungeon of the ENDLESS,2014-10-27,true,true,false,Very Positive,88,8784,11.99,11.99,0.0,true
250180,METAL SLUG 3,2015-09-14,true,false,false,Very Positive,90,5579,7.99,7.99,0.0,true
253980,Enclave,2013-10-04,true,true,true,Mostly Positive,75,1608,4.99,4.99,0.0,true
271850,Men of War: Assault Squad 2 - Deluxe Edition upgrade,2014-05-16,true,false,false,Mixed,61,199,6.99,6.99,0.0,true
282900,Hyperdimension Neptunia Re;Birth1,2015-01-29,true,false,false,Very Positive,94,9686,14.99,14.99,0.0,true
19810,The Sum of All Fears,2008-10-10,true,false,false,Mostly Positive,75,33,9.99,9.99,0.0,true

The first row of a CSV file is called the header – it’s the column names. Each subsequent row corresponds to a single game. Each value in a row is separated from the next by a comma. It’s not easy for a human to read a CSV file because the columns aren’t lined up visually, but such files are easily handled by programs. In this lab, you’ll write Python code to process the data in games.csv.

build_game_dictionary: Extracting Information from a CSV File

There is a csv module in Python that can make working with CSV files easier. However, for this lab, we will be manually parsing the CSV file to help you better understand how CSV files work.

We’ve provided some code in build_game_dictionary to get you started with reading the CSV file:

    games_dict = {}
    with open(csv_file_name, 'r', encoding='utf-8') as csv_file:
        # skip the header line
# primer-spec-highlight-start
        next(csv_file)
# primer-spec-highlight-end

# primer-spec-highlight-start
        for line in csv_file:
# primer-spec-highlight-end
            # Handle one line of data
            # TO DO: Fill in this part of the function
            pass

The next function takes a file object as an argument and returns the next line from that file. Since we’re calling it before looping through the file at all, next returns the header line. We’re discarding that return value because we don’t actually need the header line. The for loop then iterates through each subsequent line of the file, allowing us to process each line (each game) one at a time.

In the TO DO section of the above code, your first task is to add entries to the dictionary games_dict. games_dict should be of the structure described in the section above, where each key is a game title and each value is a dictionary of information about that game. The dictionary of information about that game maps attribute names (“date_release”, “win”, “mac”, “rating”, and “price_final”) to their corresponding values.

You can test your build_game_dictionary function by running public_tests.py – in particular, test_build_game_dictionary. We’ll discuss testing in this lab in more detail in the next section.

public_tests.py

In public_tests.py, study the function compare_expected_actual. This function compares the expected output of a function to the actual output of that function and prints whether they match. We will use this function in several tests. You may find it useful in testing for other course assignments as well.

Next, look at get_csv_subset and consider the following:

For reasons of efficiency, it is often useful to run tests on smaller subsets of a large dataset before trying the entire dataset itself. Thus, the purpose of this function is to return a subset of the provided games dictionary.
You can control the behavior of get_csv_subset using the start_index and count parameters. Consider the first game to be at index 0 (not counting the header). Then, start_index indicates the index of the first game to include in the subset, and count indicates how many games to include in the subset.
In the function’s implementation, first note that after opening the file, the header line is skipped.
The following for loop uses enumerate, which allows us to keep track of the current line number (using the variable i) as we loop through each line of the file. We’ll discuss enumerate in more detail in a future lecture.
The start_index parameter indicates the line number (0-indexed, not counting the header) of the first game to include in the subset. The count parameter indicates how many games to include in the subset.
Within the loop, if the current line number i is greater than or equal to start_index and less than start_index + count, then we process that line and add it to the subset dictionary.

Next, study the code in test_build_game_dictionary. This function tests your implementation of build_game_dictionary. The “correct answers” used in the tests were determined by hand using the result of get_csv_subset with a start_index of 0 and count of 5 (the value of COUNT). You could test it with other subsets as well, but first you’d have to print out the resulting subset and determine the correct answers by hand, so that you know what to compare your function’s output to.

get_best_cheap_games()

In public_tests.py, consider test_get_best_cheap_games. As you can see, compare_expected_actual is called based on results previously determined by hand to be correct for start_index = 20 and count = 50. Use this test as you implement get_best_cheap_games according to the RME.

get_most_expensive_game()

Complete get_most_expensive_game according to the RME, and note again the tests in test_get_most_expensive_game. You’ll notice that the return type for get_most_expensive_game is specified as tuple[str, float]. Consider the following illustration of that return type:

def get_PF_percentage(score: float, max: float) -> tuple[str, float]:
    """
    REQUIRES: Nothing
    MODIFIES: Nothing
    EFFECTS: Given a score and the maximum possible score,
             returns a tuple where the first element is "Pass" if the
             percentage score is 60.0 or higher and "Fail" otherwise,
             and the second element is the percentage score.
    """
    percentage = score / max * 100
    if percentage >= 60.0:
# primer-spec-highlight-start
        return "Pass", percentage
# primer-spec-highlight-end
    else:
# primer-spec-highlight-start
        return "Fail", percentage
# primer-spec-highlight-end

Note that the return statements return two values separated by a comma. In Python, this is shorthand for returning a tuple containing those two values. A tuple is an ordered collection of values, similar to a list, but unlike lists, tuples are immutable (meaning they cannot be changed after they are created). If you haven’t seen this in lecture yet, you will soon. For now, just know that the return type tuple[str, float] requires a function to return two values at the same time: a string and a float (in that order).

Submitting to the Autograder

When you have completed the lab, submit only the games.py file to the autograder using the link at the top of this page. As an initial check, make sure your code passes all the tests in public_tests.py before submitting. Note that the autograder will run tests that are similar to but distinct from public_tests.py. It’s possible that your code may pass all the public tests but still fail some of the autograder tests, most likely if your code makes assumptions that are not guaranteed by the RMEs.

Copyright and Academic Integrity

Materials for this assignment were developed with assistance from course staff, including Victoria Shipman.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All materials provided for this course, including but not limited to labs, projects, notes, and starter code, are the copyrighted intellectual property of the author(s) listed in the copyright notice above. While these materials are licensed for public non-commercial use, this license does not grant you permission to post or republish your solutions to these assignments.

It is strictly prohibited to post, share, or otherwise distribute solution code (in part or in full) in any manner or on any platform, public or private, where it may be accessed by anyone other than the course staff. This includes, but is not limited to:

Public-facing websites (like a personal blog or public GitHub repo).
Solution-sharing websites (like Chegg or Course Hero).
Private collections, archives, or repositories (such as student group “test banks,” club wikis, or shared Google Drives).
Group messaging platforms (like Discord or Slack).

To do so is a violation of the university’s academic integrity policy and will be treated as such.

Asking questions by posting small code snippets to our private course discussion forum is not a violation of this policy.