1. Failure of k-fold cross validation Consider a case in that the label is chosen at random according to P[y = 1] = P[y = 0] = 1/2. Consider a learning algorithm that outputs the constant predictor h(x)= 1 if the parity of the labels on the training set is 1 and otherwise the algorithm outputs the constant predictor h(x)=0. Prove that the difference between the leave-one-out estimate and the true error in such a case is always 1/2.
2. LetH1, . . .,Hk be k hypothesis classes. Suppose you are given m i.i.d. training examples and you would like to learn the class H = ∪ki =1 Hi . Consider two alternative approaches: _ Learn H on the m examples using the ERM rule _ Divide the m examples into a training set of size (1 − α)m and a validation set of size αm, for some α ∈ (0,1). Then, apply the approach of model selection using validation. That is, first train each class Hi on the (1−α)m training examples using the ERMrule with respect toHi, and let ˆh 1, . . .,ˆhk be the resulting\ hypotheses. Second, apply the ERM rule with respect to the finite class {ˆh 1, . . .,ˆhk} on the αm validation examples. Describe scenarios in which the first method is better than the second and vice versa.