1. Let H be a hypothesis class of binary classifiers. Show that if H is agnostic PAC learnable, then His PAC learnable as well. Furthermore, if A is a successful agnostic PAC learner for H, then A is also a successful PAC learner for H.
2. (*) The Bayes optimal predictor: Show that for every probability distribution D, the Bayes optimal predictor fD is optimal, in the sense that for every classifier g from X to {0,1}, LD( fD) ≤ LD(g).
3. (*) We say that a learning algorithm A is better than B with respect to some probability distribution, D, if
LD(A(S))≤ LD(B(S)) for all samples S ∈ (X ×{0,1})m.We say that a learning algorithm A is better than B, if it is better than B with respect to all probability distributions D over X ×{0,1}.