The world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the future or past population of a particular country was or might be. First, however, we need to formulate our data such that sklearn's Ridge regression class can train on our data. To do this, we will write a function that takes as input a country name and return a 2-d numpy array that contains the year and the measured population. Function Specifications: Should take a str as input and return a numpy array type as output. The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data. The values within the array should be of type int.

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
100%

attached image contains external data

Question 1

a)

The world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the future or past population of a particular country was or might be.

First, however, we need to formulate our data such that sklearn's Ridge regression class can train on our data. To do this, we will write a function that takes as input a country name and return a 2-d numpy array that contains the year and the measured population.

Function Specifications:

  • Should take a str as input and return a numpy array type as output.
  • The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data.
  • The values within the array should be of type int.

Hint: You'll need to use both the the population and country map dataframes given above.

def get_year_pop(country_name):

 

b)

 

Now that we have have our data, we need to split this into a training set, and a testing set. But before we split our data into training and testing, we also need to split our data into the predictive features (denoted X) and the response (denoted y).

Write a function that will take as input a 2-d numpy array and return four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the features + response of the training set, and (X-test, y_test) are the features + response of the testing set.

Function Specifications:

  • Should take a 2-d numpy array as input.
  • Should split the array such that X is the year, and y is the corresponding population.
  • Should return two tuples of the form (X_train, y_train), (X_test, y_test).

def feature_response_split(arr):

c)

Now that we have formatted our data, we can fit a model using sklearn's Ridge() class. We'll write a function that will take as input the features and response variables that we created in the last question, and returns a trained model.

Function Specifications:

  • Should take two numpy arrays as input in the form (X_train, y_train).
  • Should return an sklearn Ridge model.
  • The returned model should be fitted to the data.

Hint: You may need to reshape the data within the function. You can use .reshape(-1, 1) to do this

import numpy as np
import pandas as pd
from numpy import array
from sklearn.ensemble import RandomForest Regressor
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
population_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/world_p
opulation.csv', index_col='Country Code')
meta_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/metadata.csv
', index_col='Country Code')
population_df.head()
Transcribed Image Text:import numpy as np import pandas as pd from numpy import array from sklearn.ensemble import RandomForest Regressor from sklearn.model_selection import KFold from sklearn.metrics import mean_squared_error population_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/world_p opulation.csv', index_col='Country Code') meta_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/metadata.csv ', index_col='Country Code') population_df.head()
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps with 1 images

Blurred answer
Follow-up Questions
Read through expert solutions to related follow-up questions below.
Follow-up Question

thanks for answering the first question

this is the follow up question

b)

 

Now that we have have our data, we need to split this into a training set, and a testing set. But before we split our data into training and testing, we also need to split our data into the predictive features (denoted X) and the response (denoted y).

Write a function that will take as input a 2-d numpy array and return four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the features + response of the training set, and (X-test, y_test) are the features + response of the testing set.

Function Specifications:

  • Should take a 2-d numpy array as input.
  • Should split the array such that X is the year, and y is the corresponding population.
  • Should return two tuples of the form (X_train, y_train), (X_test, y_test).
import numpy as np
import pandas as pd
from numpy import array
from sklearn.ensemble import RandomForest Regressor
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
population_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/world_p
opulation.csv', index_col='Country Code')
meta_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/metadata.csv
', index_col='Country Code')
population_df.head()
Transcribed Image Text:import numpy as np import pandas as pd from numpy import array from sklearn.ensemble import RandomForest Regressor from sklearn.model_selection import KFold from sklearn.metrics import mean_squared_error population_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/world_p opulation.csv', index_col='Country Code') meta_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/metadata.csv ', index_col='Country Code') population_df.head()
Solution
Bartleby Expert
SEE SOLUTION
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY