Menu

Stories by Joel Snyder

Spam and statistics

Say false positive, and you immediately dive into a tough world - statistics of diagnostic tests. The terms false positive and false negative (and their cousins, true positive and true negative) are fairly easy to define. But turning the number of false positives and false negatives into easy-to-digest statistics is different, because the anti-spam community has not come to any agreement on which numbers to use across products.
A spam filter is a diagnostic test. For some set of thresholds, it will say "this is spam" or "this is not spam." In our testing, we didn't expose those thresholds. Instead, we asked the vendors to pick thresholds such that the false-positive rate would be kept to less than 1 percent. Interestingly enough, none of the vendors asked what we meant when we asked for false-positive rate. Based on your tolerance for false negatives (spam in your mailbox) or false positives (mail mismarked as spam, lost or delayed), you might want to set these thresholds differently.

Written by Joel Snyder27 Oct. 03 22:00