CS 5630, Data Mining
Spring 2020 Homework 6
Please work on the data set OJ that is part of the package ISLR and use 123 as the seed for all the necessary parts.
(a) [10pts] Create a training set containing a random sample of 800 observations, and a test set con- taining the remaining observations.
(b) [10pts] Fit a Naive Bayes classifier to the training data with Purchase as the response and the other variables as predictors. What are the training error rate and test error rate?
(c) [10pts] Fit a support vector classifier to the training data in part (a) using cost=0.01 with Purchase as the response and the other variables as predictors. What are the training and test error rates?
(d) [10pts] Use the tune( ) function to select an optimal cost. Consider cost values 0.001, 0.01, 0.1, 1, 10, 100.
(e) [10pts] Compute the training and test error rates using the best model frpm part (d).
(f) [10pts] Repeat parts (c) using a support vector machine with a radial kernel. Use the default value of gamma.
(g) [10pts] Repeat parts (f) using a support vector machine with a polynomial kernel with degree=2.
(h) [10pts] Consider parts (e, f, g), which approach has the best test error rate?