1. Learnability without Uniform Convergence: Let B be the unit ball of Rd, let H=B, let Z = B×{0,1}d, and let : Z ×H→R be defined as follows(w, (x, α))= _d i=1 αi (xi −wi )2.
This problem corresponds to an unsupervised learning task, meaning that we do not try to predict the label of x. Instead, what we try to do is to find the “center of mass” of the distribution over B. However, there is a twist, modelled by the vectors
α. Each example is a pair (x,α), where x is the instance x and α indicates which features of x are “active” and which are “turned off.” A hypothesis is a vector w representing the center of mass of the distribution, and the loss function is the squared Euclidean distance between x and w, but only with respect to the “active” elements of x.
_ Show that this problem is learnable using the RLM rule with a sample complexity that does not depend on d. _ Consider a distribution D over Z as follows: x is fixed to be some x0, and each element of α is sampled to be either 1 or 0 with equal probability. Show that the rate of uniform convergence of this problem grows with d.
Hint: Let m be a training set size. Show that if d & 2m, then there is a high probability of sampling a set of examples such that there exists some j ∈ [d] for which αj = 1 for all the examples in the training set. Show that such a sample cannot be _-representative. Conclude that the sample complexity of uniform convergence must grow with log(d). _ Conclude that if we take d to infinity we obtain a problem that is learnable but for which the uniform convergence property does not hold. Compare to the fundamental theorem of statistical learning.