1. A probabilistic label predictor is a function that assigns to every domain point x a probability value, h(x) ∈ [0,1], that determines the probability of predicting the label 1. That is, given such an h and an input, x, the label for x is predicted by tossing a coin with bias h(x) toward Heads and predicting 1 iff the coin comes up Heads. Formally, we define a probabilistic label predictor as a function, h : X → [0,1]. The loss of such h on an example (x, y) is defined to be |h(x)− y|, which is exactly the probability that the prediction of h will not be equal to y. Note that if h is deterministic, that is, returns values in {0,1}, then |h(x)− y| = 1[h(x)_=y]. Prove that for every data-generating distribution D over X × {0,1}, the Bayes optimal predictor has the smallest risk (w.r.t. the loss function _(h, (x, y)) = |h(x)− y|, among all possible label predictors, including probabilistic ones).