# M03 Assignment – Spam Email Filter

I’m stuck on a Mathematics question and need an explanation.

**Spam Email Filters.**

This assignment is Question #60 on page 219 of the textbook (Anderson, D.R., et al, 2020).

A study by *Forbes* indicated that the five most common words appearing in spam emails are *shipping!, today!, here!, available, * and *fingertips!. *Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.

shipping! |
0.051 |

today! |
0.045 |

here! |
0.034 |

available |
0.014 |

fingertips! |
0.014 |

Also suppose that the proportions of ham messages that have these words are:

shipping! |
0.0015 |

today! |
0.0022 |

here! |
0.0022 |

available |
0.0041 |

fingertips! |
0.0011 |

a. If a message includes the word *shipping!*, what is the percent probability the message is spam? If a message includes the word *shipping!*, what is the percent probability the message is ham? Should messages that include the word *shipping!* be flagged as spam?

b. If a message includes the word *today!,* what is the percent probability the message is spam? If a message includes the word *here!, *what is the percent probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?

*c. * If a messages includes the word *available, *what is the percent probability the message is spam? If a message includes the word *fingertips!*, what is the percent probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?

d. What insights to the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?

**Requirement:**

Submit your Excel spreadsheet and written conclusion here, indicated in percentages. Your conclusion can be written in Excel or in a Word document. You will be graded on your completion of the calculations in Excel and the explanation of your conclusion. Remember to use good grammar, spelling, and punctuation in your explanation and include citations if references are used to support your explanation. This assignment is worth 50 points.