An automated answer-rating site marks each post in a community forum website as “good” or “bad” based on the quality of the post. The CSV file, which you can download from OA 9.14, contains the various types of quality as measured by the tool. Following are the type of qualities that the dataset contains:
i. num_words: number of words in the post
ii. num_characters: number of characters in the post
iii. num_misspelled: number of misspelled words
iv. bin_end_qmark: if the post ends with a question mark
v. num_interrogative: number of interrogative words in the post
vi. bin_start_small: if the answer starts with a lowercase letter (“1” means yes, otherwise no)
vii. num_sentences: number of sentences per post
viii. num_punctuations: number of punctuation symbols in the post ix. label: the label of the post (“G” for good and “B” for bad) as determined by the tool. Create a logistics regression model to predict the class label from the first eight attributes of the question set. Evaluate the accuracy of your model.