1. Suppose that you perform least-squares regression without regularization with the loss function n i=1(yi − W · Xi)2, but you add spherical Gaussian noise with variance λ to each feature. Show that the expected loss with the perturbed features provides a loss function that is identical to that of L2-regularization. Use this result to provide an intuitive explanation of the connection between regularization and noise addition.
2. Show how to use the representer theorem to derive the closed-form solution of kernel least-squares regression.
3. Discuss the effect on the bias and variance by making the following changes to a classification algorithm: (a) Increasing the regularization parameter in a support vector machine, (b) Increasing the Laplacian smoothing parameter in a n¨ıve Bayes classifier, (c) Increasing the depth of a decision tree, (d) Increasing the number of antecedents in a rule, (e) Reducing the bandwidth σ, when using the Gaussian kernel in conjunction with a support vector machine.