1. Derive stochastic gradient-descent steps for the variation of L1-loss classification introduced in Exercise 6. You can use a constant step size.
2. Derive stochastic gradient-descent steps for SVMs with quadratic loss instead of hinge loss. You can use a constant step size.
3. Provide an algorithm to perform classification with explicit kernel feature transformation and the Nystr¨om approximation. How would you use ensembles to make the algorithm efficient and accurate?
4. Consider an SVM with properly optimized parameters. Provide an intuitive argument as to why the out-of-sample error rate of the SVM will be usually less than the fraction of support vectors in the training data.