1. Generalize the dynamic programming procedure given in Section 17.3 for solving the maximization problem given in the definition of ˆh in the SGD procedure for multiclass prediction. You can assume that _(y_ ,y) = _rt =1 δ(y_ t , yt) for some arbitrary function δ.
2. Prove that Equation (17.7) holds.
3. Show that the two definitions of π as defined in Equation (17.12) and Equation (17.13) are indeed equivalent for all the multivariate performance measures.
4.. Show that any binary classifier h : {0,1}d _→{0,1} can be implemented as a decision tree of height at most d +1, with internal nodes of the form (xi = 0?) for some i ∈ {1, . . .,d}.