1. Let H be some hypothesis class. For any h ∈ H, let |h| denote the description length of h, according to some fixed description language. Consider the MDL learning paradigm in which the algorithm returns: hS ∈ argmin h∈H LS (h)+ _ |h|+ln(2/δ) 2 _ , where S is a sample of size m. For any B > 0, let HB = {h ∈ H : |h| ≤ B}, and define h∗ B = arg min h∈HB LD(h). Prove a bound on LD(hS)− LD(h∗
B) in terms of B, the confidence parameter δ, and the size of the training set m. _ Note: Such bounds are known as oracle inequalities in the literature: We wish to estimate how good we are compared to a reference classifier (or “oracle”) h∗ B.