1.       Consider a text corpus with 106 documents, a lexicon of size 105, and 100 distinct words per document, which is represented as a bag of words with frequencies. (a) What is the amount of space required to store the entire data matrix without any optimization? (b) Suggest a sparse data format to store the matrix and compute the space required.

2.       In Exercise 1, let us represent the documents in 0-1 format depending on whether or not a word is present in the document. Compute the expected dot product between a pair of documents in each of which 100 words are included completely at random. What is the expected dot product between a pair with 50,000 words each? What does this tell you about the effect of document length on the computation of the dot product?

Found something interesting ?

• On-time delivery guarantee
• PhD-level professional writers
• Free Plagiarism Report

• 100% money-back guarantee
• Absolute Privacy & Confidentiality
• High Quality custom-written papers

Related Model Questions

Feel free to peruse our college and university model questions. If any our our assignment tasks interests you, click to place your order. Every paper is written by our professional essay writers from scratch to avoid plagiarism. We guarantee highest quality of work besides delivering your paper on time.

Grab your Discount!

25% Coupon Code: SAVE25
get 25% !!