Assignment 3

Goals

The goal of this assignment is to work with lists and dictionaries in Python.

Instructions

You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.9 or higher for your work. To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab.

In this assignment, we will be working with data from the United States Department of Agriculture’s FoodData Central. Rather than using this dataset directly, we have created a subset of this data, which can be read as a list of dictionaries. That data is located here, and we have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can minor-click (what right-handed people often call "right-click") and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has ten key-value pairs. Those keys and a brief description are:

  • fdc_id: a unique identifier assigned by FoodData Central
  • brand_owner’: the company that makes the product
  • brand_name: a brand name, if different from the company
  • description: the product’s name or description
  • branded_food_category: the category for the food product
  • ingredients: a comma-separated string of ingredients in the product
  • serving_size: the serving size of the product in the units specified by serving_size_unit
  • serving_size_unit: the units for the serving size value
  • nutrition: a list of dictionaries containing nutrition information; each dictionary contains the keys name, amount, and unit_type and their associated values

You will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.

Due Date

The assignment is due at 11:59pm on Thursday, February 24.

Submission

You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a3.ipynb.

Details

Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells.

0. Name & Z-ID (5 pts)

The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.

1. Serving Size Units (5 pts)

Find all of the possible values for serving size units. List each type only once!

Hints
  • Iterate through all of the list elements, and extract the serving size unit from each element
  • Use a set

2. Largest Serving Size (10 pts)

Write code to find the food item in the dataset with the largest serving size among those in grams (serving unit type is ‘g’). Output the name and brand_owner of the food item. Remember that you will need to iterate through each element of the list, and each element is a dictionary which has various keys including serving_unit_size and name.

3. Category Counts (10 pts)

Write code to create a dictionary, category_counts that keeps track of how many items each food category (branded_food_category) has listed in our sample dataset. Next, use this dictionary to find and display the name of the category that has the largest number of items.

Hints
  • The count_letters example from class may be useful

4. Add Unsaturated Fat (15 pts)

Update the list of each food item’s nutrition information to include the amount of unsaturated fat. This can be computed by subtracting the amount of saturated fat from the amount of total fat. You will need to add a new dictionary to the list of nutrition information. The keys for name and unit_type should be “Unsaturated Fat” and “G”, respectively. The amount is what you are computing via the subtraction. After computing this for all items, an item would, for example, now look like this:

{'fdc_id': 374367,
 'brand_owner': 'Swift-Eckrich Inc.',
 'brand_name': None,
 'description': 'PEPPERONI',
 'branded_food_category': 'Pepperoni, Salami & Cold Cuts',
 'ingredients': 'PORK, BEEF, SALT, CONTAINS 2% OR LESS OF FLAVORINGS, LACTIC ACID STARTER CULTURE, OLEORESIN OF PAPRIKA, SODIUM NITRITE, SPICES, SUGAR, BHA, BHT, CITRIC ACID.',
 'serving_size': 28.0,
 'serving_size_unit': 'g',
 'nutrition': [{'name': 'Fiber', 'amount': 0.0, 'unit_name': 'G'},
  {'name': 'Saturated Fat', 'amount': 14.29, 'unit_name': 'G'},
  {'name': 'Carbohydrates', 'amount': 3.57, 'unit_name': 'G'},
  {'name': 'Sodium', 'amount': 1786.0, 'unit_name': 'MG'},
  {'name': 'Total Fat', 'amount': 39.29, 'unit_name': 'G'},
  {'name': 'Protein', 'amount': 21.43, 'unit_name': 'G'},
  {'name': 'Calories', 'amount': 464.0, 'unit_name': 'KCAL'},
  {'name': 'Sugar', 'amount': 0.0, 'unit_name': 'G'},
  {'name': 'Unsaturated Fat', 'amount': 25.0, 'unit_name': 'G'}]}
Hints
  • Check specifically for a None value in the upper part of the range, and set the sum of two values to None whenever either value is None.

5. Filter by Fiber Range (15 pts)

Write a function filter_by_fiber that takes two arguments, min_fiber and max_fiber, and returns a list of food items whose amount of fiber is in the specified range, inclusive. For each item, you will need to find the Fiber listing in the nutrition list. Do not assume that item will be in a particular index of the list! Then, test whether the item’s amount of fiber is in the specified range, only including it in the returned list if it satisfies the condition. For example, the list comprehension [d['description'] for d in filter_by_fiber(6.3,6.35)] should evaluate to:

For example,

['SISTERS FRUIT COMPANY, RED DELICIOUS SLICED APPLE CHIPS, LIGHT & CRISPY',
 'PINTO BEANS',
 'VANILLA ALMOND PREMIUM NATURALLY FLAVORED GRANOLA',
 "BUSH'S Red Beans in a Mild Chili Sauce 16 oz",
 'FRUIT & NUT GRANOLA, FRUIT & NUT',
 'VANILLA ALMOND WARM VANILLA FLAVOR PERFECTLY MIXED WITH SWEET HONEY AND SATISFYING ALMONDS PREMIUM GRANOLA, VANILLA ALMOND']
Hints
  • Python allows chained comparisons
  • For each item, you don’t need to check any other entries in the nutrition information once you’ve found the one for fiber

6. [CSCI 503 Only] Filter by Ingredients (15 pts)

Only CSCI 503 students need to complete this part. CSCI 490 students may complete it for extra credit.

Write a function filter_by_ingredients that will filter the food items by their ingredients. Specifically, given an ingredient (e.g. “Apple”), return the food items that have that ingredient. Note that you should do a case-insensitive comparison so the ingredient “apple” should return food items that list “APPLE”, “Apple”, “apple”, etc. Do not worry about “apple” also matching “pineapple” (this is extra credit). For example, len(filter_by_ingredients('apple') should evaluate to 605 and the list comprehension [d['description'] for d in filter_by_ingredients('saffron')] should evaluate to:

['WILD MUSHROOM & TRUFFLE',
 'ARTICHOKE HEARTS',
 'FLAN DESSERT MIX',
 'CHICKEN TIKKA MASALA WITH SAFFRON RICE, MEDIUM',
 'CON AZAFRAN SEASONING',
 'SEASONED YELLOWRICE']
Hints
  • Make sure to handle the case where the item has no ingredients (e.g. the value is None)
  • Make sure your comparison allows case differences in both the argument and the ingredients
  • str.upper may help

Extra Credit

  • CSCI 490 Students may complete Part 6 for extra credit.
  • Update Part 6 so that it differentiates between individual ingredients. Then, for example, “apple” should not match “pineapple”.