1.Variable step size (*): Prove an analog of Theorem 14.8 for SGD with a variable step size, ηt = B Ρ √ t . 16.1 Consider the task of finding a sequence of characters in a file, as described in Section 16.2.1. Show that every member of the classHcan be realized by composing a linear classifier over ψ(x), whose norm is 1 and that attains a margin of 1.
2 Kernelized Perceptron: Show how to run the Perceptron algorithm while only accessing the instances via the kernel function. Hint: The derivation is similar to the derivation of implementing SGD with kernels.
3 Kernel Ridge Regression: The ridge regression problem, with a feature mapping
ψ, is the problem of finding a vector w that minimizes the function f (w)= λ w 2 + 1 2m _m i=1 (_w,ψ(xi )_− yi )2, (16.8) and then returning the predictor h(x) = _w,x_. Show how to implement the ridge regression algorithm with kernels. _Hint: The representer theorem tells us that there exists a vector α ∈ Rm such that mi =1 αiψ(xi ) is a minimizer of Equation (16.8).