1. Suppose that you have the n distance graphs G1,…Gn in a document corpus. You create the union of all these distance graphs by taking the union of their nodes/edges and aggregating the weights of any parallel edges. (a) Discuss the relationship of the factorization of the adjacency matrix of this graph with word-context factorization models. (b) How would you change the factorization objective function to address the effect of wide variation in counts
2. For each of the CBOW and skip-gram models, show the following: (a) Express the loss function only as a function of the inputs and weights, after eliminating the hidden layer variables. (b) Compute the gradients of the loss function with respect to the weights in the input and output layers.