Hand to spam combat: what every CIO should know
- 05 November, 2003 22:00
You have to make some decisions about controlling spam -- and make them soon. In the rapidly escalating cyber-war on internet pollution there's no quarter given, and the firefight is turning vicious. Barbarians are swarming out of the badlands, but there's no shortage of anti-spam warriors promising that they alone possess the ultimate weapon. By now you're probably sick to death of hearing about the spam problem, but have you been able to make an informed decision as to what to do about it? While lengthy debate on the feasibility of anti-spam legislation continues, and internet service providers scramble to install some kind of solution, a word to the wise: beware of low-cost, quick and easy fixes as you sit at the helm of your organisation's fight for perimeter security. There are filters and there are filters, and not all are created equal. Some can even be downright hazardous.
It's becoming impossible to pick up the IT section of any publication without confronting the kind of grim statistics like those that go with global warming. In case you haven't heard, since 2001 the amount of spam received by an average email user has increased from 3.7 to 6.2 messages per day (according to Jupiter Research). Furthermore, MessageLabs predicts that spam levels will increase to at least 50% of all email by the end of 2003. In the US, the average cost per employee for time wasted deleting spam messages is around $750 a year, and rising fast.
More bad news: from the spammer's point of view, theirs is a highly profitable billion-dollar-a-year industry that isn't going to disappear anytime soon! The good news is that there's powerful technology available for a savvy CIO to deploy as a defence against the unsolicited and wholesale invasion of corporate mailboxes.
A brief description of the history of email filtering technology may help clarify the current situation. First-generation spam filters appeared a couple of years ago, when unsolicited commercial emails first began to appear in internet mailboxes. At this stage, very few people suspected the scale of the coming onslaught, or what the enemy would be capable of. These early filters are commonly referred to as "rules-based", since they look for certain pre-defined keywords and/or phrases in the subject or contents of an incoming email. Spammers laugh all the way around them! These primitive filters also made extensive use of "black" and "white" lists, which attempted to block messages from the undesirable senders, while allowing mail through from friends and associates.
Unfortunately, these simple technologies are extremely easy to fool, and their rules and address lists are very time-consuming to maintain. Users may be required to devise more-and-more complex rules in order to block spammers, or must regularly download an updated rules database from an anti-spam (AS) vendor. The junk-recognition accuracy rate for such filters varies widely, from 10% to 45%, and they're now generally considered to be obsolete technology.
Then came second-generation filters, utilising more sophisticated techniques such as challenge-response and hashing algorithms to block spam. These often use "honey-pot" or "probe" email accounts, created specifically for collecting unsolicited bulk email. A digital snapshot or "signature" is constructed from spam messages, building huge (and cumbersome to manage) databases. Probe accounts must be monitored and vetted by teams of technicians working 24x7, and the resulting "signature" databases dispersed to clients as rapidly and frequently as possible. To get past such filters, a cunning spammer simply adds random words, varies the word order, or adds nonsense HTML comments inside various words, thereby causing their junk email to be mis-classified as good (termed a "false negative").
For systems administrators, these products are often high-maintenance and labour intensive. Although the initial purchase cost may be negligible, even free, the long-term cost of ownership is considerably higher. Furthermore, organisations that receive large volumes of inbound messages will find that what gets through this type of filter can be of a highly objectionable nature, as spammers become ever more adept at disguising the subject headers and content of their messages. The spam-recognition accuracy for second-generation filters varies from 35% to (at best) 70%.
With first and second generation filters, typically costing around US$10-$35 per user per year, the accuracy rate will perhaps be high enough to satisfy the user with a minimal spam problem. For long-time internet users whose inboxes are extremely cluttered, and for organisations with large volumes of email, allowing 20-30% of spam and pornographic messages to penetrate is probably unacceptable.
The latest category of filters are third-generation "heuristic" systems (problem-solving programs that utilise self-educating techniques to improve performance). These AS systems use advanced mathematical probability and statistical techniques to scan the entire content of an email, including headers, contents, and attachments. They then break the message up into "tokens" or "clues", and classify whether a piece of mail is spam based on expert system technology. For example, a well-designed and trained "Bayesian" filter can deliver an accuracy rate above 99%, with a false positive rate of better than 1 per 100,000 (that's desirable email incorrectly identified as spam). Even more significantly, such systems are "adaptive", meaning they learn from their mistakes. Very quickly. This feature can reduce administrative overheads to near zero.
Since the definition of spam is personal for individual users, being able to personalise the more sophisticated versions of these filters is supported by providing customers with a web interface where quarantined and "good" email can be reviewed. An individual then has the option to decide what is spam, according to their own preferences, and train the filter accordingly. These modern filters aren't limited by simplistic rules-based technology and don't rely on white or black lists that need to be continually maintained and updated.
Dr. Paul Graham, organiser of the 2003 MIT spam conference and an authority on anti-spam tools, has stated ("Will filters kill spam?", December 2002) that "Statistical (Bayesian) spam filters represent a more serious effort to fight back. If they don't put an end to spam, they'll at least ensure that we see less of it."
So there, in a not-too-technical nutshell, are the types of filter choices available in the arsenal of tools to combat spam. There are desktop and server-side products, with low, moderate and high rates of accuracy and reliability. Some can be downloaded from the internet free, but the more sophisticated and reliable tools come at a price, to which most industrial-strength enterprise versions add an installation fee and support charges.
Spam is a costly and annoying intrusion in the workplace, and control of employee's email privacy is in jeopardy without the protection provided by a well engineered filter. A thorough analysis of anti-spam tools should take into consideration the ease of administration and integration, cost of monitoring and maintenance, and of course, the option for individual users to decide their own email content preferences for deciding what is and isn't spam.
Richard Jowsey is executive director of the Death2Spam Project: http://www.death2spam.net.