1. Let X be a domain and {0,1} be a set of labels. Prove that for every distribution D over X × {0,1}, there exist a learning algorithm AD that is better than any other learning algorithm with respect to D.
2. Prove that for every learning algorithm A there exist a probability distribution, D, and a learning algorithm B such that A is not better than B w.r.t. D.
3. Consider a variant of the PAC model in which there are two example oracles: one that generates positive examples and one that generates negative examples, both according to the underlying distribution D on X. Formally, given a target function f : X → {0,1}, let D+ be the distribution over X+ = {x ∈ X : f (x) = 1} defined by D+(A)= D(A)/D(X+), for every A ⊂ X+. Similarly, D− is the distribution over X− induced by D. The definition of PAC learnability in the two-oracle model is the same as the standard definition of PAC learnability except that here the learner has access to m+ H(_, δ) i.i.d. examples fromD+ and m−(_, δ) i.i.d. examples fromD−. The learner’s goal is to output h s.t. with probability at least 1 − δ (over the choice of the two training sets, and possibly over the nondeterministic decisions made by the learning algorithm), both L(D+ , f )(h) ≤ _ and L(D−, f )(h) ≤ _.