1. Find a decision tree of depth 2 that attains zero training error. 20.1 Neural Networks are universal approximators: Let f : [ − 1, 1]n → [ − 1, 1] be a ρ-Lipschitz function. Fix some _ > 0. Construct a neural network N : [− 1, 1]n → [ − 1, 1], with the sigmoid activation function, such that for every x ∈ [ − 1, 1]n it
holds that | f (x)− N(x)| ≤ _. Hint: Similarly to the proof of Theorem 19.3, partition [−1, 1]n into small boxes. Use the Lipschitzness of f to show that it is approximately constant at each box Finally, show that a neural network can first decide which box the input vector belongs to, and then predict the averaged value of f at that box.
2. Prove Theorem 20.5.
Hint: For every f : {−1,1}n → {−1,1} construct a 1-Lipschitz function g : [−1, 1]n →[−1, 1] such that if you can approximate g then you can express f .