1 Prove Claim 14.10. Hint: Extend the proof of Lemma 13.5.
2 Prove Corollary 14.14.
3 Perceptron as a subgradient descent algorithm: Let S = ((x1, y1), . . .,(xm, ym)) ∈ (Rd ×{±1})m. Assume that there exists w ∈ Rd such that for every i ∈ [m] we have yi _w,xi_ ≥ 1, and let w_ be a vector that has the minimal norm among all vectors that satisfy the preceding requirement. Let R = maxi xi . Define a function f (w) = max i∈[m] (1− yi _w,xi _) . _ Show that minw: w ≤ w_ f (w) = 0 and show that any w for which f (w) <>1 separates the examples in S. _ Show how to calculate a subgradient of f . _ Describe and analyze the subgradient descent algorithm for this case. Compare the algorithm and the analysis to the Batch Perceptron algorithm given in
Section 9.1.2.