From 066b1ef570d2230b6b8e6b53625262e8d72e0954 Mon Sep 17 00:00:00 2001 From: Donal Heidenblad Date: Sun, 19 Feb 2023 14:26:00 -0500 Subject: [PATCH 1/2] Renamed files for consistency and to fix typos --- Chapter01/{Excercises Ch1.ipynb => Chapter 1 Exercises.ipynb} | 0 Chapter02/{Excercises Ch2.ipynb => Chapter 2 Exercises.ipynb} | 0 Chapter03/{Excercises Ch3.ipynb => Chapter 3 Exercises.ipynb} | 0 Chapter04/{Excercises Ch4.ipynb => Chapter 4 Exercises.ipynb} | 0 .../{Chaper 5 Excercises.ipynb => Chapter 5 Exercises.ipynb} | 0 .../{Chaper 6 Excercises.ipynb => Chapter 6 Exercises.ipynb} | 0 .../{Chaper 7 Excercises.ipynb => Chapter 7 Exercises.ipynb} | 0 .../{Chaper 8 Excercises.ipynb => Chapter 8 Exercises.ipynb} | 0 .../{Chaper 9 Excercises.ipynb => Chapter 9 Exercises.ipynb} | 0 .../{Chaper 10 Excercises.ipynb => Chapter 10 Exercises.ipynb} | 0 .../{Chaper 11 Excercises.ipynb => Chapter 11 Exercises.ipynb} | 0 .../{Chaper 12 Excercises.ipynb => Chapter 12 Exercises.ipynb} | 0 .../{Chapter 13 Excercises.ipynb => Chapter 13 Exercises.ipynb} | 0 .../{Chaper 14 Excercises.ipynb => Chapter 14 Exercises.ipynb} | 0 14 files changed, 0 insertions(+), 0 deletions(-) rename Chapter01/{Excercises Ch1.ipynb => Chapter 1 Exercises.ipynb} (100%) rename Chapter02/{Excercises Ch2.ipynb => Chapter 2 Exercises.ipynb} (100%) rename Chapter03/{Excercises Ch3.ipynb => Chapter 3 Exercises.ipynb} (100%) rename Chapter04/{Excercises Ch4.ipynb => Chapter 4 Exercises.ipynb} (100%) rename Chapter05/{Chaper 5 Excercises.ipynb => Chapter 5 Exercises.ipynb} (100%) rename Chapter06/{Chaper 6 Excercises.ipynb => Chapter 6 Exercises.ipynb} (100%) rename Chapter07/{Chaper 7 Excercises.ipynb => Chapter 7 Exercises.ipynb} (100%) rename Chapter08/{Chaper 8 Excercises.ipynb => Chapter 8 Exercises.ipynb} (100%) rename Chapter09/{Chaper 9 Excercises.ipynb => Chapter 9 Exercises.ipynb} (100%) rename Chapter10/{Chaper 10 Excercises.ipynb => Chapter 10 Exercises.ipynb} (100%) rename Chapter11/{Chaper 11 Excercises.ipynb => Chapter 11 Exercises.ipynb} (100%) rename Chapter12/{Chaper 12 Excercises.ipynb => Chapter 12 Exercises.ipynb} (100%) rename Chapter13/{Chapter 13 Excercises.ipynb => Chapter 13 Exercises.ipynb} (100%) rename Chapter14/{Chaper 14 Excercises.ipynb => Chapter 14 Exercises.ipynb} (100%) diff --git a/Chapter01/Excercises Ch1.ipynb b/Chapter01/Chapter 1 Exercises.ipynb similarity index 100% rename from Chapter01/Excercises Ch1.ipynb rename to Chapter01/Chapter 1 Exercises.ipynb diff --git a/Chapter02/Excercises Ch2.ipynb b/Chapter02/Chapter 2 Exercises.ipynb similarity index 100% rename from Chapter02/Excercises Ch2.ipynb rename to Chapter02/Chapter 2 Exercises.ipynb diff --git a/Chapter03/Excercises Ch3.ipynb b/Chapter03/Chapter 3 Exercises.ipynb similarity index 100% rename from Chapter03/Excercises Ch3.ipynb rename to Chapter03/Chapter 3 Exercises.ipynb diff --git a/Chapter04/Excercises Ch4.ipynb b/Chapter04/Chapter 4 Exercises.ipynb similarity index 100% rename from Chapter04/Excercises Ch4.ipynb rename to Chapter04/Chapter 4 Exercises.ipynb diff --git a/Chapter05/Chaper 5 Excercises.ipynb b/Chapter05/Chapter 5 Exercises.ipynb similarity index 100% rename from Chapter05/Chaper 5 Excercises.ipynb rename to Chapter05/Chapter 5 Exercises.ipynb diff --git a/Chapter06/Chaper 6 Excercises.ipynb b/Chapter06/Chapter 6 Exercises.ipynb similarity index 100% rename from Chapter06/Chaper 6 Excercises.ipynb rename to Chapter06/Chapter 6 Exercises.ipynb diff --git a/Chapter07/Chaper 7 Excercises.ipynb b/Chapter07/Chapter 7 Exercises.ipynb similarity index 100% rename from Chapter07/Chaper 7 Excercises.ipynb rename to Chapter07/Chapter 7 Exercises.ipynb diff --git a/Chapter08/Chaper 8 Excercises.ipynb b/Chapter08/Chapter 8 Exercises.ipynb similarity index 100% rename from Chapter08/Chaper 8 Excercises.ipynb rename to Chapter08/Chapter 8 Exercises.ipynb diff --git a/Chapter09/Chaper 9 Excercises.ipynb b/Chapter09/Chapter 9 Exercises.ipynb similarity index 100% rename from Chapter09/Chaper 9 Excercises.ipynb rename to Chapter09/Chapter 9 Exercises.ipynb diff --git a/Chapter10/Chaper 10 Excercises.ipynb b/Chapter10/Chapter 10 Exercises.ipynb similarity index 100% rename from Chapter10/Chaper 10 Excercises.ipynb rename to Chapter10/Chapter 10 Exercises.ipynb diff --git a/Chapter11/Chaper 11 Excercises.ipynb b/Chapter11/Chapter 11 Exercises.ipynb similarity index 100% rename from Chapter11/Chaper 11 Excercises.ipynb rename to Chapter11/Chapter 11 Exercises.ipynb diff --git a/Chapter12/Chaper 12 Excercises.ipynb b/Chapter12/Chapter 12 Exercises.ipynb similarity index 100% rename from Chapter12/Chaper 12 Excercises.ipynb rename to Chapter12/Chapter 12 Exercises.ipynb diff --git a/Chapter13/Chapter 13 Excercises.ipynb b/Chapter13/Chapter 13 Exercises.ipynb similarity index 100% rename from Chapter13/Chapter 13 Excercises.ipynb rename to Chapter13/Chapter 13 Exercises.ipynb diff --git a/Chapter14/Chaper 14 Excercises.ipynb b/Chapter14/Chapter 14 Exercises.ipynb similarity index 100% rename from Chapter14/Chaper 14 Excercises.ipynb rename to Chapter14/Chapter 14 Exercises.ipynb From 3133a1ff82015364764b7e7b881e1ea4018fddd3 Mon Sep 17 00:00:00 2001 From: Donal Heidenblad Date: Sun, 19 Feb 2023 14:38:09 -0500 Subject: [PATCH 2/2] Renamed headers: Excercise -> Exercise --- Chapter01/Chapter 1 Exercises.ipynb | 10 +++++----- Chapter02/Chapter 2 Exercises.ipynb | 6 +++--- Chapter03/Chapter 3 Exercises.ipynb | 12 ++++++------ Chapter04/Chapter 4 Exercises.ipynb | 12 ++++++------ Chapter05/Chapter 5 Exercises.ipynb | 16 ++++++++-------- Chapter06/Chapter 6 Exercises.ipynb | 6 +++--- Chapter07/Chapter 7 Exercises.ipynb | 8 ++++---- Chapter08/Chapter 8 Exercises.ipynb | 10 +++++----- Chapter09/Chapter 9 Exercises.ipynb | 6 +++--- Chapter10/Chapter 10 Exercises.ipynb | 8 ++++---- Chapter11/Chapter 11 Exercises.ipynb | 20 ++++++++++---------- Chapter12/Chapter 12 Exercises.ipynb | 18 +++++++++--------- Chapter13/Chapter 13 Exercises.ipynb | 24 ++++++++++++------------ Chapter14/Chapter 14 Exercises.ipynb | 28 ++++++++++++++-------------- 14 files changed, 92 insertions(+), 92 deletions(-) diff --git a/Chapter01/Chapter 1 Exercises.ipynb b/Chapter01/Chapter 1 Exercises.ipynb index cc65c1b..a063214 100644 --- a/Chapter01/Chapter 1 Exercises.ipynb +++ b/Chapter01/Chapter 1 Exercises.ipynb @@ -16,7 +16,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 1\n", + "##### Exercise 1\n", "Use the adult.csv dataset and run the codes shown in the following Screenshots. Then answer the questions." ] }, @@ -70,7 +70,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 2 \n", + "##### Exercise 2 \n", "\n", "For adult_df use the .groupby() function to run the following code and create the multi-index Series mlt_sr." ] @@ -295,7 +295,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 3\n", + "##### Exercise 3\n", "For this exercise you need to use a new dataset: billboard.csv. Visit https://www.billboard.com/charts/hot-100 and see the latest song rankings of the day. This dataset presents information and ranking of 317 song tracks in 80 columns. The first four columns are artist, track, time, and date_e. The first columns are intuitive descriptions of song tracks. The column date_e shows the date that the songs entered the hot-100 list. The rest of 76 columns are songs ranking at the end of each weeks from 'w1' to 'w76'. Download and read this dataset using pandas and answer the following questions." ] }, @@ -431,7 +431,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 4 \n", + "##### Exercise 4 \n", "\n", "We will use LaqnData.csv for this exercise. Each row of this dataset shows an hourly measurement recording of one of the five following air pollutants: NO, NO2, NOX, PM10, and PM2.5. The data was collected in a location in Londan for the entirety of year 2017. Read the data using Pandas and perform the following tasks." ] @@ -653,7 +653,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 5 \n", + "##### Exercise 5 \n", "\n", "We will continue working with LaqnData.csv. \n", "\n", diff --git a/Chapter02/Chapter 2 Exercises.ipynb b/Chapter02/Chapter 2 Exercises.ipynb index ebbbd51..812f1e6 100644 --- a/Chapter02/Chapter 2 Exercises.ipynb +++ b/Chapter02/Chapter 2 Exercises.ipynb @@ -25,7 +25,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 1\n", + "##### Exercise 1\n", "Use adult.csv and Boolean Masking to answer the following questions. " ] }, @@ -242,7 +242,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 2 \n", + "##### Exercise 2 \n", " a)\tRepeat the analysis on Exercise 1. a), but this time use groupby function. \n", " b)\tb) compare the runtime of using BM vs. groupby. (hint: you can import the module time and use the fuction .time()) \n" ] @@ -265,7 +265,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 3 \n", + "##### Exercise 3 \n", "\n", " If you have not already, solve exercise 4 in the previous chapter. After you created pvt_df for Exercises 4, run the following code.\n" ] diff --git a/Chapter03/Chapter 3 Exercises.ipynb b/Chapter03/Chapter 3 Exercises.ipynb index 52bb8f8..2570334 100644 --- a/Chapter03/Chapter 3 Exercises.ipynb +++ b/Chapter03/Chapter 3 Exercises.ipynb @@ -16,7 +16,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 1\n", + "##### Exercise 1\n", "1)\tFrom 5 colleagues or classmates ask to provide a definition for the term data. \n", "\n", " a)\tReport these definitions and indicate the similarity among them. \n", @@ -36,7 +36,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 2\n", + "##### Exercise 2\n", "\n", "For this exercise, we are going to use covid_impact_on_airport_traffic.csv. Answer the following questions. This dataset is from Kaggle.com, use this link to see its page: https://www.kaggle.com/terenceshin/covid19s-impact-on-airport-traffic.\n", "The key attribute of this dataset is PercentOfBaseline which shows the ratio of air traffic in the specific day compared to pre-pandemic time (1st Feb to 15th March 2020)" @@ -335,7 +335,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 3 \n", + "##### Exercise 3 \n", "\n", "For this exercise, we are going to use US_Accidents.csv. Answer the following questions. This dataset is from Kaggle.com, use this link to see its page: https://www.kaggle.com/sobhanmoosavi/us-accidents.\n", "This dataset shows all the car accidents in the US from February 2016 to Dec 2020. \n", @@ -769,7 +769,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 4 \n", + "##### Exercise 4 \n", "\n", "For this exercise, we are going to use fatal-police-shootings-data.csv. There are a lot of debates, discussions, dialogues, and protests happening in the US surrounding police killings. The Washington Post has been collecting data on all fatal police shootings in the US. The dataset available to the government and the public alike has date, age, gender, race, location, and other situational information of these fatal police shootings. You can read more about this data on https://www.washingtonpost.com/graphics/investigations/police-shootings-database/, and you can download the last version of the data from https://github.com/washingtonpost/data-police-shootings" ] @@ -980,7 +980,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 5\n", + "##### Exercise 5\n", "For this exercise, we will be using electricity_prediction.csv. The screenshot below shows the 5 rows of this dataset and a linear regression model created to predict electricity consumption based on the weekday and daily average temperature. " ] }, @@ -1137,7 +1137,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 6\n", + "##### Exercise 6\n", "For this exercise, we will be using adult.csv. we used this dataset extensively in chapter 1. Read the dataset using Padans and call it adult_df." ] }, diff --git a/Chapter04/Chapter 4 Exercises.ipynb b/Chapter04/Chapter 4 Exercises.ipynb index 49cf749..49e4e0b 100644 --- a/Chapter04/Chapter 4 Exercises.ipynb +++ b/Chapter04/Chapter 4 Exercises.ipynb @@ -16,7 +16,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 1\n", + "##### Exercise 1\n", "In your own words, describe the difference between a dataset and a database. \n" ] }, @@ -31,7 +31,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 2\n", + "##### Exercise 2\n", "What are the advantages and disadvantages of structuring data for a relational database? Mention at least two advantages and two disadvantages. Use examples to elucidate. " ] }, @@ -54,7 +54,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 3 \n", + "##### Exercise 3 \n", "\n", "In this chapter, we were introduced to 4 different types of databases: relational databases, unstructured databases, distributed databases, and blockchain. \n", "\n", @@ -118,7 +118,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 4 \n", + "##### Exercise 4 \n", "In this chapter, we were introduced to five different methods of connecting to databases: direct connection, webpage connection, API connection, request connection, and publicly shared. Use the following table to indicate a ranking for each of the five methods of connecting to databases based on the specified criteria. Study the rankings and provides reasoning for why they are correct." ] }, @@ -176,7 +176,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 5\n", + "##### Exercise 5\n", "Using the Chinook database as a sample, we want to investigate and find an answer to the following question: Do tracks that are titled using positive words sell better on average than tracks that are titled with negative words. We would like to only focus on the following words in the investigations. \n", "\n", "- List of negative words: ['Evil', 'Night', 'Problem', 'Sorrow', 'Dead', 'Curse', 'Venom', 'Pain', 'Lonely', 'Beast']\n", @@ -246,7 +246,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Excercise 6\n", + "##### Exercise 6\n", "In the year 2020, which of the following 12 stocks experienced the highest growth. \n", "\n", "Stocks: [‘Baba’, ‘NVR’, ‘AAPL’, ‘NFLX’, ‘FB’, ‘SBUX’, ‘NOW’, ‘AMZN’, ‘GOOGL’, ‘MSFT’, ‘FDX’, ‘TSLA’]\n", diff --git a/Chapter05/Chapter 5 Exercises.ipynb b/Chapter05/Chapter 5 Exercises.ipynb index a1fdf08..52e3a03 100644 --- a/Chapter05/Chapter 5 Exercises.ipynb +++ b/Chapter05/Chapter 5 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 5: Data Visualization \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -30,7 +30,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In this exercise, we will be using Universities_imputed_reduced.csv. Draw the following described visualizations.\n", "\n", " a.\tUse boxplots to compare the student to faculty ratio (stud./fac. ratio) for the two population public and private universities.\n", @@ -233,7 +233,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "\n", "In this exercise, we will continue using Universities_imputed_reduced.csv. Draw the following described visualizations.\n", "\n", @@ -288,7 +288,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "\n", "For this example, we will be using WH Report_preprocessed.csv. Draw the following described visualizations.\n", "\n", @@ -352,7 +352,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "\n", "For this exercise, we will continue using WH Report_preprocessed.csv. Draw the following described visualizations.\n", "\n", @@ -392,7 +392,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 5\n", + "# Exercise 5\n", "\n", "For this exercise, we will be using whickham.csv. Draw the following described visualizations.\n", "\n", @@ -587,7 +587,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 6\n", + "# Exercise 6\n", "\n", "For this exercise, we will be using WH Report_preprocessed.csv. \n", "\n", @@ -637,7 +637,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 7\n", + "# Exercise 7\n", "\n", "For this exercise, we will continue using WH Report_preprocessed.csv. \n", "\n", diff --git a/Chapter06/Chapter 6 Exercises.ipynb b/Chapter06/Chapter 6 Exercises.ipynb index c1cc610..bbb226c 100644 --- a/Chapter06/Chapter 6 Exercises.ipynb +++ b/Chapter06/Chapter 6 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 6: Prediction \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -30,7 +30,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "“MLP has the potential to create prediction models that are more accurate than predictions models that are created by linear regression.” This statement is generally correct. In this exercise, we want to explore one of the reasons why the statement is correct. Answer the following questions.\n", "\n", " a) The following formula shows the linear equation that we used to connect the dependent and independent attributes of the MSU number of applications problem. Count and report the number of coefficients that Linear Regression can play with to fit the equation to the data. \n", @@ -53,7 +53,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "2.\tIn this exercise, we will be using ToyotaCorolla_preprocessed.csv. This dataset has the following columns: Age, Milage_KM, Quarterly_Tax, Weight, \tFuel_Type_CNG, Fuel_Type_Diesel, Fuel_Type_Petrol, and Price. Each data object in this dataset is a used Toyota Corolla car. We would like to use this dataset to predict the price of used Toyota Corolla cars. \n" ] }, diff --git a/Chapter07/Chapter 7 Exercises.ipynb b/Chapter07/Chapter 7 Exercises.ipynb index 79b7b71..9278836 100644 --- a/Chapter07/Chapter 7 Exercises.ipynb +++ b/Chapter07/Chapter 7 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 7: Classification \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -30,7 +30,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "The chapter asserts that before using KNN you will need to have your independent attributes normalized. This is certainly true, but how come we were able to get away with no-normalization when we performed KNN using visualization? See Figure 7.3. \n" ] }, @@ -45,7 +45,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "We did not normalize the data when applying the Decision Tree to the Loan Application problem. For practice and deeper understanding, apply the Decision Tree to the normalized data, and answer the following questions. " ] }, @@ -88,7 +88,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "For this exercise, we are going to use the Customer Churn.csv. This dataset is randomly collected from an Iranian telecom company’s database over a period of 12 months. A total of 3150 rows of data, each representing a customer, bear information for 13 columns. The attributes that are in this dataset are listed below:\n", " \n", " Call Failures: number of call failures\n", diff --git a/Chapter08/Chapter 8 Exercises.ipynb b/Chapter08/Chapter 8 Exercises.ipynb index 414444c..b66380c 100644 --- a/Chapter08/Chapter 8 Exercises.ipynb +++ b/Chapter08/Chapter 8 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 8: Clustering Analysis\n", - "#### Excercises" + "#### Exercises" ] }, { @@ -29,7 +29,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In your own words, answer the following two questions. Use at most 200 words, to answer each question.\n", "\n", " a.\tWhat is the difference between Classification and Prediction?\n", @@ -47,7 +47,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "Consider Figure 8.6 regarding the necessity of normalization before performing Clustering analysis. With this new appreciation you developed in this chapter, would you like to change your answer to the first exercise question from the previous chapter?\n" ] }, @@ -62,7 +62,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "In this chapter, we used WH Report_preprocessed.csv to form meaningful clusters of countries only using 2019 data. In this exercise, we want to use the data of all the years 2010-2019. Perform the following steps to do this." ] }, @@ -126,7 +126,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "For this exercise we will be using the dataset Mall_Customers.xlsx to form 4 meaningful clusters of customers. The following steps will help you to do this correctly. " ] }, diff --git a/Chapter09/Chapter 9 Exercises.ipynb b/Chapter09/Chapter 9 Exercises.ipynb index 82e94f8..555044b 100644 --- a/Chapter09/Chapter 9 Exercises.ipynb +++ b/Chapter09/Chapter 9 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 9: Data Cleaning - Levels Ⅰ and Ⅱ \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -28,7 +28,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In your own words describe the relationship between analytics goals and data cleaning. Your response should answer the following questions.\n", "\n", " a.\t Is data cleaning a separate step of data analytics and can be done in isolation? In other words, can data cleaning be performed without knowing about the analytics?\n", @@ -47,7 +47,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "\n", "A local airport to analyze the usage of its parking has employed a Single Beam Infrared Detector (SBID) technology to count the number of people who pass the gate from the parking to the airport. \n", "As shown in the following figure, an SBDI records the time every time the infrared connection is blocked signaling the entrance or the exit of a passenger." diff --git a/Chapter10/Chapter 10 Exercises.ipynb b/Chapter10/Chapter 10 Exercises.ipynb index 47cb315..d2bf11b 100644 --- a/Chapter10/Chapter 10 Exercises.ipynb +++ b/Chapter10/Chapter 10 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 10: Data Cleaning Level Ⅱ- Unpack, restructure, and reformulate the table\n", - "#### Excercises" + "#### Exercises" ] }, { @@ -28,7 +28,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "This question is regarding the difference between dataset reformulation and dataset restructuring. Answer the following questions.\n", "\n", " a.\tIn your own words described the difference between dataset reformulation and dataset restructuring. \n", @@ -47,7 +47,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "For this exercise, we will be using 'LaqnData.csv' which is collected from the London Air website (https://www.londonair.org.uk/LondonAir/Default.aspx) and include the hourly readings of 5 air particles (NO, NO2, NOX, PM2.5, and PM10) from a specific cite. Perform the following steps for this dataset." ] }, @@ -129,7 +129,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "In this exercise, we will be using stock_index.csv. This file has hourly data for stock indices Nasdaq, S&P, and Dow Jones from Nov 7, 2019, until June 10th, 2021. Each row of data represents an hour of the trading day, and each row is described by the opening value, closing value and the volume for each of the three mentioned stock indices. The opening value is the value of the index at the beginning of the hour, the closing value is the value of the index at the end of the hour, and volume is the amount of trading that happened in that hour\n", "\n", "In this exercise, we would like to perform a clustering analysis to understand how many different types of trading days we experience during 2020. Using the following attributes that are calculable from stock_df.csv we’d like to use K-Means to cluster the stock trading days of 2020 into 4 clusters. \n", diff --git a/Chapter11/Chapter 11 Exercises.ipynb b/Chapter11/Chapter 11 Exercises.ipynb index d9ece74..f259ddd 100644 --- a/Chapter11/Chapter 11 Exercises.ipynb +++ b/Chapter11/Chapter 11 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 11: Data Cleaning - Level ⅠⅡ \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -29,7 +29,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In this exercise, we will be using 'Temperature_data.csv'. This dataset has some missing values. Do the following.\n", "\n", " a. After reading the file into a Pandas DataFrame, check if the dataset is level Ⅰ clean and if not clean it. Also, describe the cleanings if any." @@ -268,7 +268,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "In this exercise, we are going to use the file ‘Iris_wMV.csv’. Iris data includes 50 samples of three types of iris flowers, totaling 150 rows of data. Each flower is described by its sepal and petal length or width. The column PetalLengthCm has some missing values.\n", "\n", " a. Confirm that PetalLengthCm has five missing values. " @@ -330,7 +330,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "In this exercise, we will be using ‘imdb_top_1000.csv’. More information about this dataset maybe found on this link: https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows. Perform the following steps for this dataset. \n", "\n", " a.\tRead the file into movie_df, and list the level Ⅰ data cleaning steps that the dataset needs. Implement the listed items, if any. " @@ -535,7 +535,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "In this exercise, we will be using two CSV files: responses.csv and columns.csv. The two files are used to record the date of a survey conducted in Slovakia. To access the data on Kaggle.com use this link: https://www.kaggle.com/miroslavsabo/young-people-survey. Perform the following items for this data source. " ] }, @@ -811,7 +811,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 5\n", + "# Exercise 5\n", "One of the most common approaches for fraud detection is using outlier detection. In this exercise, you will use 'creditcard.csv' from https://www.kaggle.com/mlg-ulb/creditcardfraud to evaluate the effectiveness of outlier detection for credit card fraud detection. Pay attention that most of the columns in this data source are processed values to uphold data anonymity. Perform the following steps.\n", "\n", " a.\tCheck the state of the dataset for missing values and address them if any." @@ -894,7 +894,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 6\n", + "# Exercise 6\n", "In Chapter 5 and Chapter 8 we used ‘WH Report_preprocessed.csv’ which is the preprocessed version of ‘WH Report.csv’. Now that you have learned numerous data preprocessing skills, you will be preprocessing the dataset yourself.\n", "\n", " a.\tCheck the status of the dataset for missing values. " @@ -984,7 +984,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 7\n", + "# Exercise 7\n", "\n", "Specify if the following items describe random errors or systematic errors.\n", "\n", @@ -1005,7 +1005,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 8\n", + "# Exercise 8\n", "Study Figure 11.14 one more time, and run the first three Exercises by the flowchart in this figure and note down the path that led to our decisions regarding the missing values. Did we take steps in dealing with missing values that were not listed in this figure or this chapter? Would it be better to have a more complex figure so every possibility would be included, or not? Why or why not?" ] }, @@ -1020,7 +1020,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 9\n", + "# Exercise 9\n", "Explain why the following statement is incorrect: A row may have a significant number of MCAR missing values." ] }, diff --git a/Chapter12/Chapter 12 Exercises.ipynb b/Chapter12/Chapter 12 Exercises.ipynb index 5021882..4eeca4d 100644 --- a/Chapter12/Chapter 12 Exercises.ipynb +++ b/Chapter12/Chapter 12 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 12: Data Fusion & Data Integration \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -28,7 +28,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In your own words, what is the difference between Data Fusion and Data Integration? Give examples other than the ones in this chapter. \n" ] }, @@ -43,7 +43,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "Answer the following question about **Challenge 4: Aggregation mismatch**. Is this challenge a data fusion one, a data integration, or both? Explain." ] }, @@ -58,7 +58,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "How come **Challenge 2: Unwise data collection** is somehow both a data cleaning step and a data integration? Do you think it is essential that we categorize if an unwise data collection should be under data cleaning or data integration? " ] }, @@ -73,7 +73,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "In Example 1 of this chapter, we used multi-level indexing using Date and Hour to overcome the index mismatched formatting challenge. For this exercise, repeat this example but this time use a single level indexing using python DataTime object." ] }, @@ -88,7 +88,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 5\n", + "# Exercise 5\n", "Recreate **Figure 5.23** from **Chapter 5 Data Visualization**, but this time instead of using *WH Report_preprocessed.csv*, integrate the following three files yourself first: *WH Report.csv*, *populations.csv*, and *Countires.csv*. Hint: information about happiness indices come from *WH Report.csv*, information of the countries content comes from *Countires.csv*, and population information comes from *populations.csv*. " ] }, @@ -103,7 +103,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 6\n", + "# Exercise 6\n", "In **Chapter 6, Exercise 2**, we used *ToyotaCorolla_preprocessed.csv* to create a model that predicts the price of cars. In this exercise, we want to do the preprocessing ourselves. Use *ToyotaCorolla.csv* to perform the following steps.\n", "\n", " a.\tAre there any concerns regarding Level Ⅰ data cleaning? If yes, address them if necessary. \n", @@ -125,7 +125,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 7\n", + "# Exercise 7\n", "We would like to use the file *Universities.csv* to cluster the universities into two meaningful clusters. However, the data source has many issues including data cleaning levels Ⅰ - Ⅲ and data redundancy. Perform the following steps.\n", "\n", " a.\tDeal with data cleaning issues\n", @@ -146,7 +146,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 8\n", + "# Exercise 8\n", "\n", "In this exercise, we will see an example of data fusion. The case study that we will use in this exercise was already introduced under Data Fusion Example in this chapter, please go back and read it again before continuing with this exercise. \n", "In short, in this example, we would like to integrate Yeild.csv and Treatment.csv to see if the amount of water that can impact the amount of yield.\n", diff --git a/Chapter13/Chapter 13 Exercises.ipynb b/Chapter13/Chapter 13 Exercises.ipynb index 0a12eda..003b444 100644 --- a/Chapter13/Chapter 13 Exercises.ipynb +++ b/Chapter13/Chapter 13 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 13: Data Reduction \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -29,7 +29,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In your own words, describe the similarities and differences between Data Reduction and Data Redundancy from the following angles: the literal meanings of the terms, their objectives, and procedural.\n" ] }, @@ -44,7 +44,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "If one decides to include or exclude independent attributes based on the correlation coefficient value of each independent attribute with the dependent attribute in a prediction task, how would you label the name of this preprocessing? Data redundancy or data reduction?" ] }, @@ -59,7 +59,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "In this example, we will be using **new_train.csv** from https://www.kaggle.com/rashmiranu/banking-dataset-classification. Each row of the data contains customer information along with campaign efforts regarding each customer to get them to subscribe for a long-term deposit at the bank. In this example, we would like to tune a decision tree that can show us the trends that leads to successful subscription campaigning. As the only tuning process we know will be computationally very expensive, we have decided to perform one of the numerosity data reductions we’ve learned in this chapter to ease the computation for the tuning process. Which method would fit this data better? Why? Once you arrived at the data reduction method you want to use, apply the method, tune the decision tree and draw the final decision tree. In the end, comment on a few interesting patterns you found on the final decision tree. " ] }, @@ -386,7 +386,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "In this chapter, we learned six dimensionality reduction methods. For each of the six methods, specify if the method is supervised or unsupervised, and why?" ] }, @@ -401,7 +401,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 5\n", + "# Exercise 5\n", "Use Decision Tree and Random Forest to evaluate the usefulness of the independent attributes in **new_train.csv**. Report and compare the results from both dimension reduction methods." ] }, @@ -416,7 +416,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 6\n", + "# Exercise 6\n", "Use Brute-force Computational Dimension Reduction to figure out the optimum subset of the independent attributes that the KNN algorithm needs for the classification task described in Exercise 3. If the task is computationally too expensive, what is one strategy that we learned that can curb that? If you did end up using that strategy, could you say the subset you’ve found is still optimum?" ] }, @@ -431,7 +431,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 7\n", + "# Exercise 7\n", "In this exercise, we will use the data ToyotaCorolla.csv to create a prediction model using MLP that can predict car prices. Take the following steps.\n", "\n", " a.\tDeal with all the data cleaning issues if any." @@ -494,7 +494,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 8\n", + "# Exercise 8\n", "\n", "In this exercise, we would like to use the dataset **Cereals.csv**. This dataset contains rows of information about different cereal products. We would like to perform clustering analysis on this dataset, first using K-means and then using PCA. Perform the following steps." ] @@ -841,7 +841,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 9\n", + "# Exercise 9\n", "\n" ] }, @@ -1515,7 +1515,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 10\n", + "# Exercise 10\n", "Figure 13.2 was created using a Decision Tree after random sampling. Recreate the figure but this time use random Over/under-sampling where the sample has 500 churning customers and 500 nob-churning customers. Describe the differences in the final visual. " ] }, @@ -1530,7 +1530,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 11\n", + "# Exercise 11\n", "Figure 13.7 shows the result of dimension reduction for the task of predicting the next day’s Amazon Stock Price using Linear Regression. Perform the dimension reduction using the Decision Tree and compare the results. Don’t forget, to do so, you would first need to tune the DecisionTreeRegressor() from sklearn.tree. You may use the following code for tuning." ] }, diff --git a/Chapter14/Chapter 14 Exercises.ipynb b/Chapter14/Chapter 14 Exercises.ipynb index 4146c0d..c9cb063 100644 --- a/Chapter14/Chapter 14 Exercises.ipynb +++ b/Chapter14/Chapter 14 Exercises.ipynb @@ -10,7 +10,7 @@ " AUTHOR: Dr. Roy Jafari \n", "\n", "### Chapter 14: Data Transformation and Data Massaging \n", - "#### Excercises" + "#### Exercises" ] }, { @@ -29,7 +29,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 1\n", + "# Exercise 1\n", "In your own words, what are the difference and similarities between Normalization and Standardization? How come some use them interchangeably? \n" ] }, @@ -44,7 +44,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 2\n", + "# Exercise 2\n", "There are two instances of Data Transformation done during the subchapter Binary Coding, Ranking Transformation, and Discretization that can be labeled as Massaging. Try to spot them and explain how come they can be labeled that way." ] }, @@ -59,7 +59,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 3\n", + "# Exercise 3\n", "Of course, we know that one of the ways that the color of a data object is presented is by using their names. This is why we would assume color probably should be a normal attribute. However, you can transform this usually nominal attribute to numerical ones. There are two possible approaches. What are they? (Hint: one of them is attribute construction using RGB coding). Apply the two approaches to the following small dataset. The data shown in the table below are accessible in the file color_nominal.csv." ] }, @@ -303,7 +303,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 4\n", + "# Exercise 4\n", "You’ve seen two examples of attribute construction so far. One was under the subchapter Example – Construct one transformed attribute from two attributes, and the other was the previous exercises. Use these examples to argue if Attribute Construction is Data Massaging or not? " ] }, @@ -318,7 +318,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 5\n", + "# Exercise 5\n", "In this exercise, you will get to work on a dataset collected for research and development. The dataset was used in a recent publication titled Misfire and valve clearance faults detection in the combustion engines based on a multi-sensor vibration signal monitoring to show high-accuracy detection of engine failure is possible using vibrational signals. To see this article you may use this link: https://www.sciencedirect.com/science/article/abs/pii/S0263224118303439.\n", "\n", "The dataset that you have access to is Noise_Analysis.csv. This dataset has 7500 rows, each showing 1 second (1000 milliseconds) of the engine vibrational signal and the state of the engine (Label). We want to use the vibrational signal to predict the state of the engine. There are 5 states - H: Healthy, M1: Missfire 1, M2: Missfire 2, M12: Missfire 1&2, VC: Valve Clearance. \n", @@ -991,7 +991,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 6\n", + "# Exercise 6\n", "We discussed in this chapter the possible distinction between Data Massaging and Data Transformation. We also saw that Functional Data Analysis (FDA) can be used both for Data Reduction and Data Transformation. Review all of the FDA examples you have experienced in this book (Chapter 13 Data Reduction, and this chapter) and use them to make a case regarding if the FDA should be labeled as Data Massaging or not. " ] }, @@ -1006,7 +1006,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 7\n", + "# Exercise 7\n", "Review exercise 8 in Chapter 12 Data Fusion & Integration. In that exercise, we transformed the attribute of one of the datasets so the fusion of the two sources became possible. How would you describe that data transformation? Could we call it Data Massaging?\n" ] }, @@ -1021,7 +1021,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 8\n", + "# Exercise 8\n", "In this exercise, we will use BrainAllometry_Supplement_Data.csv from a paper titled The allometry of brain size in mammals. The data may also be accessed from https://datadryad.org/stash/dataset/doi:10.5061/dryad.2r62k7s. \n", "With the hope of being able to see the relationship between mean body mass and mean brain mass of species in nature, one has come up with the following scatterplot. However, you can see that the relationship is not very well shown. What transformation could fix this? Apply it and then share your observations.\n" ] @@ -1061,7 +1061,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 9\n", + "# Exercise 9\n", "\n", "In this chapter we learned three techniques to deal with noise, naming, Smoothing, Aggregation and Binning. Why do you think these methods were covered under Data Transformation, and not under Data Cleaning Level Ⅲ? Explain." ] @@ -1077,7 +1077,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 10\n", + "# Exercise 10\n", "In two chapters, (Chapter 13 Data Reduction, and this chapter) and under three areas of data preprocessing we have shown the applications of Functional Data Analysis (FDA): Data Reduction, Feature Extraction, Smoothing. Find examples of the FDA in these two chapters, and then explain how come FDA can manage to do all these different data preprocessing. What is about FDA that allows it to be such a multipurpose toolkit? \n" ] }, @@ -1092,7 +1092,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 11\n", + "# Exercise 11\n", "In Figure 14.18, we saw that.KernelReg() on all of signal_df did not perform very well, but it did perform excellently on part of it. How about trying to smooth all of singal_df by a combination of rolling data smoothing and functional data smoothing? To do this we need to have windows rolling calculations with a step size. Unfortunately, the pandas .rolling() function only accommodate the step size of one as shown in Figure 14.18. So take the matters in your hand and engineer a looping mechanism that can use .KernelReg() to smooth all of singal_df. \n" ] }, @@ -1132,7 +1132,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 12\n", + "# Exercise 12\n", "Use United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv to recreate Figure 14.21. You may want to pull the most updated data from https://catalog.data.gov/dataset/united-states-covid-19-cases-and-deaths-by-state-over-time to develop an updated visualization. (Hint: you will need to work with the two columns new_case and new_death) \n" ] }, @@ -1472,7 +1472,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Excercise 13\n", + "# Exercise 13\n", "It may seem like that binning and aggregation are the same method, however, they are not. Study the two examples in this chapter and explain what’s the difference between Aggregation and binning?\n" ] },