Purpose
This task evaluates the student’s technical skills in the management of unstructured data, with potential usage in real
applications.
These assessment supports student understandings of the techniques related to unstructured data management and
data processing
Instructions
• Read these instructions.
• Answer as many questions as possible.
• Place your name, ID and answers in your document.
• Please submit your Word file with your answers and graphs (embedded) where appropriate as a SINGLE
document in the Submission Portal.
• Do not submit PDF files.
Question 1 (15 marks)
Suppose you have joined a search engine development team to design a search algorithm based on both the Vector
model and the Boolean model.
You have collected the following (3) documents (unstructured) and plan to apply an index technique to convert them
into an inverted index.
Doc 1data science is a field to use scientific method, process, algorithm, system to extract knowledge.
Doc 2data mining is the process to discover pattern in large data to involve method at the database system.
Doc 3information system is the study of network of hardware and software that people use to process data.
To answer the below questions, you have to provide the detailed procedures step by step.
Question 1.1: In the process of creating the inverted index, please complete the following steps:
Remove all stop words and punctuation.The list of stop words for this task is provided as follows:
Is, An, That, Use, And, To, From, In, Both, Of, At, The
Attachments: