Save your time - order a paper!
Get your paper written from scratch within the tight deadline. Our service is a reliable solution to all your troubles. Place an order on any task and we will take care of it. You won’t have to worry about the quality and deadlinesOrder Paper Now
hi can you help me with these questions: 22.4 This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification: unigram words? bigrams? message length, sender, time of arrival? Then train a classification algorithm (decision tree, naive Bayes, SVM, logistic regression, or some other algorithm of your choosing) on a training set and report its accuracy on a test set. 22.7 Write a regular expression or a short program to extract company names. Test it on a corpus of business news articles. Report your recall and precision. questions taken from the book Stuart Russel and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd Edition, Prentice Hall, 2010. I have some partial solution for the problems: this one for 22.4 22.4 There are several open source implementations for performing classification. You could use one of those and your implementation is always encouraged. Apply the spam-filtering algorithm onto the set of features you have extracted (bigrams, unigrams and trigrams) and use the test set of your data set (you could do 80:20 split on your data i.e. 80% for training (or learning) and 20% for testing). The expected accuracy is high (> 80%) and for anything less, the selected features need to be evaluated and new features might need to be introduced and this one for 22.7 22.7 The simplest approach is to look for a string of capitalized words, followed by “Inc” or “Co.” or “Ltd.” or similar markers. A more complex approach is to get a list of company names (e.g. from an online stock service), look for those names as exact matches, and also extract patterns from them. Reporting recall and precision requires a clearly defined corpus, which should include at least 10 news articles with a number of company names mentioned in each. One example code: [0-9]*s(([A-Z(&)*]+[a-z.();’]*s)+)(?=Company|company|Inc.|Inc|Co.|Co rporation|Limited|Ltd.|) The good regular expression has both accuracy and recall around or above 50%. But a little lower than that is fine.