1. Consider the mapping φ : Rd →Rd+1 defined by φ(x) = (x, x 2). Show that if x1, . . .,xm are shattered by Bd then φ(x1), . . .,φ(xm) are shattered by the class of halfspaces in Rd+1 (in this question we assume that sign(0) = 1). What does this tell us about VCdim(Bd)?
2. (*) Find a set of d +1 points in Rd that is shattered by Bd . Conclude that d +1 ≤ VCdim(Bd ) ≤ d +2.
3. Boosting the Confidence: Let A be an algorithm that guarantees the following: There exist some constant δ0 ∈ (0,1) and a function mH : (0, 1) → N such that for every _ ∈ (0, 1), if m ≥ mH(_) then for every distribution D it holds that with probability of at least 1−δ0, LD(A(S))≤ minh∈H LD(h)+_. Suggest a procedure that relies on A and learns H in the usual agnostic PAC learning model and has a sample complexity of mH(_, δ) ≤ kmH(_)+
2log(4k/δ) _2 , where k =_log(δ)/log(δ0)_.
Hint: Divide the data into k +1 chunks, where each of the first k chunks is of size mH(_) examples. Train the first k chunks using A. Argue that the probability that for all of these chunks we have LD(A(S))> minh∈H LD(h)+_ is at most δk0 ≤ δ/2.
Finally, use the last chunk to choose from the k hypotheses that A generated from the k chunks (by relying on Corollary 4.6).