1. Suppose you are given an embedding found by word2vec of each of the d terms in the lexicon. You are also given an n × d document-term matrix D containing the term frequencies of each document in its rows with the same lexicon of size d. Propose a heuristic to find the coordinates of the documents in terms of this word embedding.
2. Suppose you have additional syntactic features of words such as the part-of-speech, orthography, and so on. Show how you would incorporate such features in word2vec.
3. How can you use an RNN to predict grammatical errors in a sentence?
4. Consider a d × d word-word context matrix C = [cij ] in which cij is the frequency of word j in the context of word i. The goal is to learn d × p and p × d matrices U and V , respectively, so that applying softmax to each row of UV matches the relative frequencies in the corresponding row of C. Create a loss function for this probabilistic factorization of C into U and V . Discuss the relationship with the skip-gram model.