1. Derive the gradient update equations for using factorization machines in binary classification with logistic loss. Derive the prediction function and updates for hinge loss.
2. Derive the gradient-descent updates for the optimization problem in Sect. 8.2.5. Discuss the special case of β = 0
3. Show that the space required by the inverted index is exactly proportional to that required by a sparse representation of the document-term matrix.
4. The index construction of Sect. 9.2.3 assumes that document identifiers are processed in sorted order. Discuss the modifications required when the document identifiers are not processed in sorted order. How much does this modification increase the time complexity?