1. • Mission: Write Python3 code to do binary classification. • Data set: The Horse Colic dataset. You need to use horse-colic.data and horse-colic.test as training set and test set respectively. - The available documentation is analyzed for an assessment on the more appropriate treatment. Missing information is also properly identified. Read the dataset description for the dataset information. The goal is to do the prediction of attributes 23 ("what happened to the horse?") using attributes 1, 2 and 4 to 22 as predictors. We only concern ourselves about if a horse died and not about how it died, therefore you have to treat it as a binary problem (after grouping "euthanized" with "died"). This task has 2 fewer examples due to missing values in the class variable for these two examples. In accordance to the documentation, attributes 3 and 28 are not used because they do not provide useful information. Attributes 25, 26 and 27 ("type of lesion?") are also discarded because they represent alternative class variables. Please take note that the counts of missing values are calculated based on the complete dataset. • Approaches: - Classifier (required): k-nearest neighbors. Please use scikit learn library: sklearn.neighbors.KNeighbors Classifier Imputation (required): k-nearest neighbors. Please use scikit learning function: sklearn.impute.KNNImputer - Other data pre-processing or feature engineering methods (optional): note that the types of attributes include continuous, discrete, and categorical. You can apply any technique you prefer. Performance metric: Accuracy classification score. Please user scikit learn library: sklearn.metrics.accuracy_score • Submission: Please submit two files. First file is the source code (.ipynb) which contains all your source code. Please name the second file containing the screenshot of your code results

icon
Related questions
Question

Solve this question and submit screenshots of the code and its results

1.
• Mission: Write Python3 code to do binary classification.
• Data set: The Horse Colic dataset. You need to use horse-colic.data and horse-colic.test as training
set and test set respectively.
The available documentation is analyzed for an assessment on the more appropriate treatment.
Missing information is also properly identified. Read the dataset description for the dataset
information. The goal is to do the prediction of attributes 23 ("what happened to the horse?")
using attributes 1, 2 and 4 to 22 as predictors. We only concern ourselves about if a horse died
and not about how it died, therefore you have to treat it as a binary problem (after grouping
"euthanized" with "died").
This task has 2 fewer examples due to missing values in the class variable for these two examples.
In accordance to the documentation, attributes 3 and 28 are not used because they do not provide
useful information. Attributes 25, 26 and 27 ("type of lesion?") are also discarded because they
represent alternative class variables. Please take note that the counts of missing values are
calculated based on the complete dataset.
• Approaches:
- Classifier (required): k-nearest neighbors. Please use scikit learn library:
sklearn.neighbors.KNeighbors Classifier.
- Imputation (required): k-nearest neighbors. Please use scikit learning function:
sklearn.impute.KNNImputer
- Other data pre-processing or feature engineering methods (optional): note that the types of
attributes include continuous, discrete, and categorical. You can apply any technique you prefer.
Performance metric: Accuracy classification score. Please user scikit learn library:
sklearn.metrics.accuracy_score
• Submission: Please submit two files. First file is the source code (.ipynb)
which contains all your source code. Please name the second file containing the screenshot
of your code results
Transcribed Image Text:1. • Mission: Write Python3 code to do binary classification. • Data set: The Horse Colic dataset. You need to use horse-colic.data and horse-colic.test as training set and test set respectively. The available documentation is analyzed for an assessment on the more appropriate treatment. Missing information is also properly identified. Read the dataset description for the dataset information. The goal is to do the prediction of attributes 23 ("what happened to the horse?") using attributes 1, 2 and 4 to 22 as predictors. We only concern ourselves about if a horse died and not about how it died, therefore you have to treat it as a binary problem (after grouping "euthanized" with "died"). This task has 2 fewer examples due to missing values in the class variable for these two examples. In accordance to the documentation, attributes 3 and 28 are not used because they do not provide useful information. Attributes 25, 26 and 27 ("type of lesion?") are also discarded because they represent alternative class variables. Please take note that the counts of missing values are calculated based on the complete dataset. • Approaches: - Classifier (required): k-nearest neighbors. Please use scikit learn library: sklearn.neighbors.KNeighbors Classifier. - Imputation (required): k-nearest neighbors. Please use scikit learning function: sklearn.impute.KNNImputer - Other data pre-processing or feature engineering methods (optional): note that the types of attributes include continuous, discrete, and categorical. You can apply any technique you prefer. Performance metric: Accuracy classification score. Please user scikit learn library: sklearn.metrics.accuracy_score • Submission: Please submit two files. First file is the source code (.ipynb) which contains all your source code. Please name the second file containing the screenshot of your code results
Expert Solution
steps

Step by step

Solved in 4 steps with 1 images

Blurred answer