RIA     3-D CAPTCHA      
Receiver Initiated Authentication: A Practical Method to Authenticate Incoming Email

 

Michael G. Kaplan

 A PDF version is available here.

 

 

I can address any feedback, questions, or concerns via email.  A separate invention of mine, a CAPTCHA that is invulnerable to automated decryption, is detailed via a link on the menu bar above.

 


Abstract
A practical method for authenticating all incoming email is described. Existing SPF records authenticate some email but the requirement of domain administrators to perfectly maintain their SPF record profoundly limits the utility of SPF. A rapidly compiled and near comprehensive Receiver Generated SPF database will ensure almost universal email authentication. This is achieved by bouncing difficult to classify email along with a request to simply resend the bounce. The domain and sending MTA from the now authenticated resent email will be entered into a single shared Receiver Generated SPF database. All future non-forwarded email from this domain sent via this server will be authenticated after consulting this database. Uniquely generated sub-addresses can, as an option, be sent with all outgoing email; forwarded email is in effect authenticated by these sub-addresses thereby rectifying a major flaw with conventional SPF. Email that is clearly spam will be deleted regardless of the presence of these optional sub-addresses. Auto-Resend software, an optional but very useful and trivial to implement upgrade to email clients and mail servers, will transparently resend bounces that correspond to recently sent emails. Nearly all email will be authenticated and spam will be almost completely blocked; users will not have to alter their current behavior almost without exception. This system will work without requiring others to alter their software.


 

1 Introduction
Spam is frequently sent using a spoofed address, while spam sent with an authenticated address can be recognized with near certainty.1 Universal email authentication would be an extraordinary tool for defeating spam yet it remains an apparently impossible ideal. This goal can be achieved by the following unique and practical method.


2 Background
2.1 The Major Weakness of Existing Email Authentication Schemes
Existing email authentication schemes, namely SPF/Sender ID and DKIM, share the same major weakness: Universal authentication would require the administrators of every email domain in the world to individually configure/register and continually update as servers are added. The number of different email networks is enormous and constantly expanding. Many of the largest, most motivated networks have become SPF or DKIM compliant. 75% of fortune 100 companies use SPF and 45% employ DKIM for their marketing email. Unfortunately motivation to adopt email authentication drops precipitously once we leave the realm of the fortune 100 companies. IronPort predicts the adoption rate will level off at 50 percent or so by late 2007.2 Though many popular domains are authentication compliant the absolute percentage of compliant domains is only about 3%.3 Roughly 40 percent of all legitimate mail received by Hotmail users is authenticated using Sender ID.4

2.2 Existing Statistical Filters Are Highly Impressive but Incomplete
Existing statistical filters already do a remarkable job of classifying email. Most email is unambiguously and correctly classified as either spam or ham. A small amount of email is classified as unsure and nearly all successfully delivered spam comes from this small unsure group.5 Often more than 96% of spam is identified with certainty.6


3 Receiver Initiated Authentication Can Authenticate All Incoming Email
The centerpiece of this system is a rapidly constructed nearly comprehensive SPF database independent of the participation of domain administrators. This system is designed to: 1) generate this SPF database in the most practical and transparent way possible, 2) authenticate the small number of emails missed by this SPF database in the most transparent way possible, and 3) challenge the miniscule fraction of emails that are unclear despite successful authentication – ideally the visible challenge rate will approach the false-positive rate of existing anti-spam systems thus negating any reasonable argument against this challenge issuing system.

3.1 Overview
This system, termed Receiver Initiated Authentication (RIA), is summarized in Figure 1.  Figure 1A depicts RIA as employed without sub-addresses and Figure 1B depicts RIA with the use of sub-addresses.  This paper will focus on RIA as deployed with sub-addresses as depicted in Figure 1B.  RIA will authenticate all email regardless of the use of sub-addresses, but it is more efficient (i.e. fewer bounces) with sub-addresses and their use as employed by RIA is almost totally non-disruptive.  Email system administrators who are sub-address adverse can still employ RIA as depicted in Figure 1A.
 
 

 

Figure 1: How RIA processes email A) without the use of sub-addresses and B) with sub-addresses.

 

Unique sub-addresses are dispatched in the ‘From’ field with routine outgoing email. RIAuser@domain.com may send RIAuser^85nxsm@domain.com to one individual and RIAuser^n4sw5z@domain.com to another individual. When these correspondents reply they will naturally use these unique sub-addresses. It should be noted, however, that correspondents can initiate emails to RIAuser@domain.com without bothering to use a sub-address; this system almost without exception does not require users to alter their current behavior.

Incoming email will continue to receive a spam probability score via a statistical filter although now the presence of a sub-address will weigh into this score. Email that is clearly ham will be passed to the inbox regardless of whether a sub-address is used. Likewise email classified as clearly spam is deleted regardless of the use of a sub-address.

The small fraction of email that is classified as unsure is handled very differently. If the ‘unsure’ email is sent with a sub-address then the sub-address will be deactivated. The entire ‘unsure’ email is then bounced to the sender with a new sub-address as demonstrated in Figure 2.



Figure 2: RIA generated bounce with sub-address viewable as plain text.

The sender now hits ‘Reply’ and sends back the email with the included text. The email, now authenticated, will arrive in the inbox of RIAuser@domain.com. The now extraneous text will be parsed from this reply email so that RIAuser@domain.com will see the email only in its original form.

The receipt of this resent bounce will reactivate the original sub-address that had been initially deactivated. Both the new sub-address and the old sub-address will be usable.

This system employs bounces rather than denying ‘unsure’ emails during SMTP time. RIA needs to be able to deal with email sent via forwarders. The sender may not be informed of non-delivery if a forwarder first accepts the email and then the email was rejected during SMTP time by the recipient's email system.


4 A Near Comprehensive SPF Database Is Created From the IP Addresses of the MTAs in the Return Path of the Resent Bounces
4.1 An Independent SPF Database is Generated
With RIA the successful return of a bounce authenticates the sender. The sender's domain (now authenticated) and the servers listed in the Return-Path of the bounce will be added to a private SPF record (referred heretofore as Receiver Generated SPF). A domain with a single MTA will likely need to resend only a single bounce; all future emails from all of the users of that domain will be successfully authenticated by consulting the Receiver Generated SPF database. Domains with more than one MTA will need to resend a bounce for each MTA before their Receiver Generated SPF record will be considered complete.


Example: The domain spamfizzle.com has 100 users and 2 MTAs (called MTA1 and MTA2). Unfortunately, as with so many domains, the administrator of spamfizzle.com never created an SPF record and all of the domain’s email is unauthenticated.

Spamfizzle.com does not employ RIA but its users send out thousands of emails to recipients that do participate in RIA. These RIA using recipients use a statistical filter and luckily almost all of the email coming from spamfizzle.com passes easily through the filter and directly reaches the inbox despite the fact that none of the spamfizzle.com email is authenticated.

At some point Michael@spamfizzle.com sends a single email to a single recipient (let's say the recipient is in Norway) but this time the statistical filter gives a rating of 'unsure'. This single email is bounced back to Michael@spamfizzle.com and Michael then manually resends the bounce. This resent bounce is sent via MTA1 and successfully reaches the Norwegian recipient.

Resending the bounce has authenticated spamfizzle.com. At this point other challenge/response systems would simply have the individual Norwegian recipient white-list Michael@spamfizzle.com; no one else in world, unfortunately, benefits.

RIA now does something radically different. RIA takes the spamfizzle.com domain and the sending MTA (MTA1) listed in the return path of the bounce and places them together in a globally accessibly private Receiver Generated SPF database.

Now Bob@spamfizzle.com sends an email via MTA1 to a Japanese recipient. This Japanese recipient accesses the globally accessible Receiver Generated SPF database and confirms that spamfizzle.com is known to send email via MTA1. The Japanese recipient has authenticated the email from Bob@spamfizzle.com based on the information garnered by the challenge resent by Michael@spamfizzle.com to a Norwegian recipient.

Emails sent via MTA2 will not be authenticated until the above process repeats itself and a single bounce is again resent via MTA2.

Every future non-forwarded email sent by any of the 100 users of spamfizzle.com will be authenticated as long as the recipients have access to this global Receiver Generated SPF database. This is all the result of just two people manually resending a single bounce.


4.1.1 The Flaws of SPF are Thus Solved
The two greatest flaws of current SPF are:
  1. Lack of a comprehensive SPF database - A single very large email provider utilizing RIA will quickly generate a near comprehensive and rapidly updating Receiver Generated SPF database.
  2. Forwarding without Rewriting the sender will not generate an SPF pass - Ham with a sub-address sent via a forwarding service will almost certainly directly reach the inbox.  Bounces, as a final backup, will authenticate forwarded mail that is classified as 'unsure' by the statistical filter.

 

RIA, via sub-addresses and bounces, also compensates for various other situations where SPF can not be applied such as when mail is submitted directly from a system with a dynamically-assigned IP address.


4.2 Even non-RIA Participating Email Providers Can Use the Receiver Generated SPF Database
This newly generated Receiver Generated SPF database can be utilized by all conventional email filtering services. Full use of RIA, however, is required to overcome all the flaws inherent to SPF.

4.3 Servers Listed in the Return-Path of the Original Email Will Also Be Added to the Receiver Generated Database 
The servers listed in the return path of the original email will often be the same as those listed in the return path of the resent bounce, but they can differ. For instance some MX servers may forward email to a different location such that the MTA in the resent bounce may never match the MTA used to send the original email.

The servers in the original email will also be added to the Receiver Generated SPF database but there are some caveats; the Return-Path of the original email may contain forwarding servers that are unaffiliated with the domain in question. This problem will be mitigated by only entering the server from the original email into the database after two separate bounces sent by two different RIA users are returned.


Example: Michael@spamfizzle.com sends unauthenticated mail via MTA3 to a French RIA user at ami@bonjour.fr. Michael’s email is bounced and then resent via MTA4 to ami@bonjour.fr. The Receiver Generated SPF database now associates MTA4 with spamfizzle.com so all future email sent from spamfizzle.com via MTA4 will be authenticated. Mail from MTA3 will still not be authenticated because at this point it is it unclear if MTA3 is really a forwarder set up by the user ami@bonjour.fr.

Bob@spamfizzle.com now sends an email via MTA3 to an Argentinean RIA user at Juan@mendoza.ar. This email is bounced and resent again via MTA4. Two separate bounces have now verified that MTA3 likely belongs to spamfizzle.com so this association is entered into the Receiver Generated SPF database. All future email sent from spamfizzle.com via MTA3 are now authenticated.


4.4 RIA is Highly Beneficial Even if a Small Number of Domains Cannot be Authenticated via SPF
Various other proposals hold the promise of effectively combating spam but they impossibly require perfect compliance by all members of the email community before they are effective. RIA has no such requirement. RIA, therefore, cannot be negated by arguments pointing out atypical situations whereby a small percentage of domains cannot generate an SPF record via RIA.

 

Some legitimate email domains will somewhat foolishly use a dynamically assigned IP address for their MTA. These domains cannot be SPF authenticated although RIA’s other mechanisms can authenticate their email. There may be other special unusual circumstances that will prevent a domain from being SPF authenticated via RIA.

A compelling case can be made that the Receiver Generated SPF database will only be able to authenticate 98% and not 100% of domains, but this will still be an enormous improvement over the status quo. RIA’s other mechanisms will be required to authenticate email from the very small number of remaining domains.

4.4.1 Domains with Large Numbers of MTAs
Some unauthenticated email domains may have a large number of MTAs requiring a large number of bounces. A large domain with 200 MTAs will require at least 200 resent bounces before all of its email is authenticated, but this bounce rate is trivial if the domain sends a million emails every year.

Very large domains with huge farms of SMTP servers (Hotmail, Yahoo, etc.) already authenticate all of their email. It will only be a very rare domain that is small and unauthenticated yet sends its email via an enormous pool of servers.

4.5 RIA Will Identify Forwarding Servers Set Up By Individual RIA Users
Forwarding can break authentication via SPF. Most forwarding is orchestrated by the receiver and not the sender. Ideally receivers would inform their email systems about their authorized forwarders but in reality this often does not happen. Information garnered by RIA can be used to identify forwarders set up by individual users.

Email from reputable domains sent to a recipient via the recipient’s forwarder will on occasion get classified as ‘unsure’ as a result of SPF being broken. Successful return of this bounce by the reputable domain will prove that the forwarder is associated with the individual recipient. This forwarder will now be classified as a trusted forwarder for this individual RIA using recipient.


Example: Webmail.com is a large webmail provider that employs RIA. A user creates the account Peggy@webmail.com. Peggy then directs her old college email account to forward all email (via Forwarder1) to her new webmail.com account but she neglects to inform her webmail service that Forwarder1 is a trusted forwarder. An email sent by Amazon.com to Forwarder1 is forwarded to Peggy@webmail.com and it is marked as ‘unsure’ and bounced. The domain Amazon.com has an outstanding reputation but webmail.com was unable to distinguish between a spoofed email and one that was sent via a forwarder.

Amazon.com resends the bounce and the email is successfully received. Forwarder1 is now identified as a trusted forwarder for Peggy@webmail.com. Now Citibank.com sends an email to her old college account and Forwarder1 passes it on to Peggy@webmail.com. Citibank.com now passes SPF authentication as Forwarder1 is trusted.

A spammer operates the domain Dubious.com and sends unauthenticated mail via a zombie to Peggy@webmail.com. This mail receives an ‘unsure’ rating and is bounced and Dubious.com resends the bounce. Peggy@webmail.com receives this email and realizes it is spam but she simply deletes the email and neglects to report the email as spam. Referencing a domain reputation database reveals that Dubious.com has a questionable reputation. The spam zombie that sent the original email is not designated a trusted forwarder for Peggy@webmail.com.

 

With no additional effort RIA has identified a trusted forwarder for an individual user.




5 Auto-Resend Software Will Ensure that Senders Will Never Manually Interact with Bounced Emails
An essential aspect of RIA is the fact that it does not require everybody to adopt it before it works. A single large email service provider can implement RIA, establish a near perfect SPF database, and block nearly all spam without a single outside entity making a single change. As per the example in section 4.1 and as detailed in section 12 bounces will be so rare that most users will never see one. That being said it is possible to make visible bounces even more of a rarity by having others implement updates to their email clients and/or server systems.

 

5.1 Bounces are Automatically Resent
Auto-Resend software will ensure that almost no one will see or be required to manually respond to the email seen in Figure 2. Auto-Resend software is a simple onetime update for webmail systems, email clients, and local mail servers. The sender’s Auto-Resend software will do the following:

  • The envelope information in the bounce seen in Figure 2 will be correlated with recent outgoing email. Bounces that correlate to recently sent emails will be automatically resent. Neither the sender nor the receiver will even be aware that the original email was blocked.

 

  • Addresses in the sender’s address book will be automatically updated with the most recent sub-address. If the contact is not listed in the address book then an ‘invisible’ address book will contain the correspondent’s sub-address information. Email subsequently sent to RIAuser@domain.com will automatically be sent using the sub-address on file.

 

Greeting card sites and news sites allow people to fill in forms to send cards and articles to friends using their own return address. Bounces will always be sent to the sending system’s own envelope return address to ensure that the bounce will be successfully resent.


5.2 Universal Distribution of Auto-Resend Software is a Surprisingly Simple Thing to Achieve
Experience with SPF and DKIM demonstrates the impracticality of expecting nearly every administrator for each of the immense number of email networks across the globe to individually configure their systems to authenticate all of their email. Near universal distribution of Auto-Resend software, however, is a completely different situation. The reason is that the Auto-Resend update can be done either at the level of the email client or the local mail server via a routine software update by an extremely small number of entities.

5.2.1 Widespread Distribution at the Webmail and Email Client Level
Webmail is dominated by Hotmail with a 35.5% market share while Yahoo has 35.1% of the webmail market.7 The email client market is dominated by Microsoft with 58% market share in the corporate desktop client segment and nearly 49% share in the consumer desktop client segment. IBM Lotus follows Microsoft, with nearly 20% market share in the corporate desktop segment, and less than 3% market share in the consumer desktop segment.8

The cooperation of just three companies would bestow Auto-Resend software onto 70.6% of the global webmail population, 78% of the corporate desktop segment, and 52% of the consumer desktop segment. There are only a handful of email clients and only a few of these are major players. These major players are very highly motivated to tackle spam; this is in contrast to the countless hordes of email networks that have not lifted a finger to implement SPF or DKIM. There is little disincentive to implement Auto-Resend software as it is a one-time upgrade that remains dormant until needed.

5.2.2 Auto-Resend Distribution at the Local Mail Server Level
Adoption of Auto-Resend software at the level of the local mail server will supplement the email client update; Auto-Resend need only be present on either the email client or the server software. Ideally the server and client software will coordinate.

A survey of the visible mail servers on the internet reveals that 85% employ software provided by Sendmail, Microsoft, Exim, and Postfix.9 The server market, as with the webmail and client market, requires only a handful of entities to institute Auto-Resend in order to ensure that the bounces generated by RIA are never seen by users.

Servers that employ Bounce Address Tag Validation (BATV) will simply resend all verified bounces; otherwise the envelope information from incoming bounces will be correlated with a list of recent outgoing emails before the bounce is resent. Servers that do not employ BATV may still be able to correlate bounces with recently sent emails although this will not work 100% of the time if this information is stored on different servers. Ideally new versions of IMAP will better handle this function; it may take years but eventually this new standard will become prevalent. It is not within the scope of this paper to designate the specific mechanism by which Auto-Resend will function at the server level; this paper is merely pointing out that such upgrades are possible and that 100% adoption or efficacy of Auto-Resend function at the server level is neither necessary nor anticipated.

There are circumstances where the Auto-Resend software of both the server and the email client will fail to function. The user will then be obligated to manually resend the bounce. Failure of Auto-Resend to function does not result in failure of authentication; it merely results in a loss of transparency.

 

5.3 Rapid Deployment of Auto-Resend is Neither Required nor Expected

Auto-Resend is beneficial yet slowness to adopt Auto-Resend is tolerable as it is non-essential. Most people will not have Auto-Resend during the first year after a single major email provider institutes RIA. It is reasonable to assume that most people will be able to acquire it within 5-10 years. Time, in this case, is on the side of RIA.


6 Suspicious Domains Will Be Neutralized By CAPTCHA Encoded Sub-addresses
RIA will authenticate practically every email that is potentially spam via the combination of a comprehensive Receiver Generated SPF database, resent bounces, individually identified trusted forwarders (section 4.5), and the widespread use of sub-addresses. Domains, once authenticated, can be classified almost without ambiguity as either spam or ham domains. There ends up being a smattering of domains with intermediate reputation scores. Many of the messages in this in-between category are from legitimate bulk senders, and their lower reputations are a reflection of less-than-ideal sending practices.10

Spam domains will be black-listed. Domains that are probably (but not unambiguously) spam domains will become classified as ‘suspicious.’ Suspicious domains will receive the bounce seen in Figure 3.



Figure 3: CAPTCHA laden bounce for a suspicious sender.

This bounce requires human intervention to decode the sub-address within the CAPTCHA and manually resend the message. It will be very rare for any legitimate sender to encounter such a bounce; legitimate bulk senders with less-than-ideal sending practices will be impacted the most. Legitimate senders who have had their domains hijacked will also be impacted. The operators of the hijacked domains will need to cleanse their servers to get off the suspicious list.

The suspicious list will not only contain domain names but any IP address that might be compromised. Suspicious servers will be avoided if possible, otherwise the bounce from Figure 3 will be sent to any domain that uses a hijacked server.


6.1 Updated Email Clients Will Reformat CAPTCHA Bounces to Make Resending Easier
Legitimate senders will rarely encounter a CAPTCHA bounce, but the task of resending one can be simplified by another one-time email client update to reformat the bounce seen in Figure 3 so that it appears as seen in Figure 4.



Figure 4: CAPTCHA bounce reformatted by the email client for convenience.

6.2 Spammers Cannot Circumvent the CAPTCHA
Attempting to circumvent the CAPTCHA contained within a bounce is vastly more challenging than the more common practice of circumventing a CAPTCHA that is used to guard against automated webpage registrations. To acquire the CAPTCHA the spammer must first send a non-spoofed email and must have the ability to receive and process the resulting bounce. Attempting to solve the CAPTCHA will again require sending a second non-spoofed (the sub-address will link this second email to the sender of the first email even if the second sender is ‘spoofed’).

 

6.2.1 Automated Decryption of the CAPTCHA can be Prevented
The CAPTCHA employed by Microsoft and Yahoo now yield to automated attack 30-35% of the time.11 If needed The 3-D CAPTCHA, a CAPTCHA that is likely invulnerable to any practical attempts at automated attack, can be employed. [The 3-D CAPTCHA is explained in detail at http://spamfizzle.com/CAPTCHA.aspx]. A user will only be required to decode 2 characters; the remaining characters of the sub-address will be given as plain text. A decryption rate of 1% will force a spammer accustomed to sending 100 million spam a day using spoofed domains to now send 10 billion spam a day from non-spoofed domains – an impossible task.

Sweatshop labor to decode CAPTCHA is also not feasible as an expense rate as little as a tenth of a cent per decoded CAPTCHA will impose a crushing financial burden to any spammer expecting to decode tens of millions of CAPTCHA a day. The urban legend of using a pornographic website to get others to solve CAPTCHA for spammers is laughable as no spammer could possibly get more than an insignificant number of people to visit their website and solve problems for them.

 

6.3 The Extreme Rarity of the CAPTCHA Makes It Tolerable

RIA, even without the CAPTCHA, will authenticate all questionable email. Bulk senders with poor sending practices are the most likely to encounter a CAPTCHA; 'normal' email senders are almost guaranteed never to see a CAPTCHA unless their systems are under spammer control.

6.3.1 People Who Use Graphics Incompatible Email Clients are No Worse Off
A minority of people use graphics incompatible email clients and they can not view the CAPTCHA. These individuals are only sent a CAPTCHA because their domains have a poor reputation and the email they sent failed the analysis of a content filter. Much of this email that was bounced would have been junked by a conventional email filter. These graphics incompatible email clients may have no provision for viewing a CAPTCHA, but they certainly do not have any provision for when their sent emails are placed directly into the spam folder. The users of graphics incompatible email clients will likely prefer dealing with a RIA generated CAPTCHA bounce than with the likely prospect of their email unknowingly disappearing into a spam folder.

7 Neutralizing Possible Spammer Circumvention of RIA
Spammer botnets have turned tens of millions of computers into zombies sending out massive amounts of spam. Botnets are most useful to the spammer for sending unauthenticated spam, but they do little to deliver authenticated spam. RIA will neutralize existing botnets by preventing unauthenticated spam from reaching inboxes.

7.1 A Theoretical Way Future Spambots May Try To Circumvent RIA
Future generations of botnets will need to adopt new methods. Future spambots might take over a user’s personal computer and intercept an incoming bounce sent to a user’s email address. This bounce can be forwarded to another zombie computer. The zombie computer will resend the bounce. The Receiver Generated SPF database is erroneously updated with the domain of the legitimate original sender being associated with the zombie computer as an MTA. The zombie can now send apparently authenticated spam using the innocent user’s domain. This subterfuge will permit spammers to send only a very limited amount of spam for the following reasons:

1. All major domains (Yahoo, AOL, Earthlink, Hotmail, etc.) and smaller domains that have published SPF records with “-all” designations will not be entered into the Receiver Generated SPF database. Thus massive numbers of personal computers will be completely immune to this gambit. Smaller domains with SPF records with “?all” designations can have additional servers added to them in the Receiver Generated SPF database but more stringent criteria (such as multiple confirmatory resent bounces over a greater period of time) will be required.


2. Domain/server combinations added to the Receiver Generated SPF database will be assigned a reputation score and an activity score. The activity score of the Receiver Generated SPF listing will rise every time this database is accessed to authenticate an email. The permitted activity of a domain/MTA listing is determined by the reputation score. In this way a massive simultaneous zombie mailing exploiting a single falsified Receiver Generated SPF domain/MTA listing will be foiled as only a finite number of Receiver Generated SPF checks will be allowed.


Example: Assume that 100 million personal PCs exist with the aforementioned zombies, and each PC had one user with an email account that was able to post to the Receiver Generated SPF list (i.e. the email domain does not already have an official SPF record). Each zombie is able to send 1000 spam before the facetious domain/server was delisted and the offending email address was banned from making new postings to the Receiver Generated SPF database. 100 billion spam are sent before this enormous botnet permanently burns itself out. Over 60 billion spam were sent every day in October 200612, so this apparently enormous botnet would exhaust itself in two days if existing spam levels were maintained.


The email address that was exploited by the zombie to generate a falsified Receiver Generated SPF listing will no longer be used to generate new Receiver Generated SPF listings. Interestingly the email address of every person whose computer was infected with such a spambot will be known. These users can be warned about their infection via email.

7.1.1 Different Email Service Providers Will Contribute to a Single Universal Receiver Generated SPF Database
Email providers that use RIA will place their reputation data in a single universal database that will be similar to preexisting proposals for a standardized domain reputation report database.13 This database will have unique aspects: 

  • Reputation will be assigned to domain/MTA pairs rather than just to domains.

 

  • Spammers may have some control over domain/MTA combinations but the spammers may lack the ability to force these domains to resend spam bounces. Other spammer influenced domain/MTA combinations may send spam and successfully resend the subsequent spam containing bounces. The inability of spammers to resend bounces may indicate that a zombie PC is spoofing other legitimate users of an MTA but the MTA itself is not controlled by the spammer (the bounces are returning towards non-spammer controlled computers). The database will distinguish between partially compromised domain/MTA combinations that can be neutralized via a Figure 2 bounce and fully compromised ones that require a Figure 3 bounce.

 

  • The database will contain forensic data that will be used to identify the compromised servers and email addresses that resulted in a falsified listing. A user who sends email from a zombie PC will have his email address flagged; bounces subsequently sent to this specific users email address will not be used to generate future listings in the Receiver Generated SPF database. The person who used the compromised PC can then be warned via email that his PC or local server is controlled by a spammer.

 

7.2 Spam Sent via Hotmail, Yahoo, and Gmail
These three free webmail services are the only ones with enough capacity to permit spammer exploitation in a globally important way, therefore they deserve special attention.

Email from these services is already authenticated, so bounces will instead serve to alert these services that the account is possibly being used to send spam. These webmail services might automatically resend 10 bounces a day per account, but receipt of a greater number within a single day will be considered suspicious and the account holder will be required by the webmail service to pass a humanity test before any more bounces are resent. Alternately these services might attach a tag to the first 10 outgoing emails of the day that indicates that the user is trusted and any bounces will definitely be resent; naturally there will be little need to generated bounces for these emails labeled as trusted by the webmail service.

These webmail services will be notified when RIA users report spam, thus allowing these webmail services the chance to self police. If these large webmail providers fail to self-police then CAPTCHA laden bounces can be sent to these suspicious webmail accounts.

In addition users of Hotmail, Yahoo, or Gmail accounts can be blocked from seeing the sub-addresses sent back with the bounces. Therefore spammers will not be able to harvest the sub-addresses from the bounces and use them to send email via spoofed addresses.

8 A Powerful Tool against Phishing
Phishing will be almost completely blocked along with the rest of spam. Domains commonly associated with phishing (e.g. Paypal.com, Citibank.com) will never be permitted to use a sub-address that was issued to a different domain. In addition the original source of a shared sub-address will be displayed prominently within the email as in Figure 5.




Figure 5: Email as seen when the sender shares a sub-address that was given to another correspondent.

The user will be extremely suspicious as to how and why his bank acquired his email address from his uncle.

Not allowing third parties to use sub-addresses would be even more effective, but RIA is designed to make bounces a rarity. Experience with a more conventional sub-address generating anti-spam system suggests that 90-97% of all ham will use a sub-address that is not used by spammers.14 Sharing sub-addresses, therefore, facilitates the unhindered receipt of ham originating from strangers.

9 Misdirected Bounces Generated By RIA Will Have a Negligible Impact on Innocent Email Users and Systems
Misdirected indiscriminant bounces are a major problem, so some may fear that a widely enacted anti-spam system based on bounces will be disastrous. A report available from Ironport15 details this very real problem:

  • 9% of global email traffic is misdirected bounce mail, 71% is spam/viruses/phishing, and 20% is legitimate.
  • Less than 0.5% of bounce messages make it through to the end user.
  • 20% or more of what a spammer sends is bounced because of invalid addresses.
  • There are 4.5 billion misdirected bounce messages per day. 10% of these have valid addresses resulting in 450 million being directed to mailboxes each day.

 

The impact can be seen in a model where we assume that, as determined by filter score, 4% of the lowest rated ham 4% of the highest rated spam gets bounced. We will also assume that 50% of the global email population employs RIA.

9.1 Effect on global email traffic

(4% spam bounced)*(71% global spam)*(50% global RIA participation) = 1.42% global increase in bounces generated by spam

(4% ham bounced)*(20% global ham)*(50% global RIA participation) = 0.4% global increase in bounces generated by ham

Conclusion: 1.82% increase in global email traffic.

9.2 Effect on DDoS
Assume a spammer sends 100 million spam using the return address of a single company. We will guess that only 50% of the spammer’s emails are sent to a real email addresses.

(100 million)*(20% bounced because of invalid addresses) = 20 million misdirected emails hitting a single company’s systems (this is what is currently occurring).

(100 million spam)*(50% of spam that targets a real address)*(50% of global population using RIA)*(4% spam bounce rate) = one million additional emails will hit the company’s system, resulting in a total of 21 million, a 5% increase in bounce volume.

Conclusion: 5% increase in the volume of a DDoS attack.

Actually proper filtering is unlikely to mistake an unauthenticated email sent from a dubious server with a legitimate email from a Fortune 500 company. The true increase in the volume of bounce DDoS attacks for large companies is likely much less or almost non-existent.

An individual user whose address was spoofed on a large scale by spammers (Joe Jobbing) will also only see a 5% increase in erroneous bounce volume.

9.3 The Average Effect of Bouncing Spam on the Inboxes of Third Parties

(50% participation)*(4% spam bounce rate)*(10% of spam that spoofs an existing ‘From’ address) = 0.2% increase in global spam directed at real addresses.

Conclusion: 0.2% average increase in spam directed at real users. This increase will never be seen by users of RIA and any other system that uses BATV or other means to filter out erroneous bounces.

The true burden of erroneous bounces is caused by indiscriminant bouncing. Highly selective bouncing, even when implemented by a massive segment of the email community, is relatively insignificant.

10 Implementing RIA, Elimination of the Junk Mail Folder, and the Finite Resources of Spammers
10.1 A Model of How RIA Might Be Implemented

1. One or more large email service providers will announce their intention to institute RIA, and a few of the largest email client and mail server software developers institute Auto-Resend.


2. The email service provider sends out sub-addresses with routine outgoing email for 3 months before bouncing a very small percentage of email. Initially bounces will only be sent in response to emails sent using email clients or servers updated with Auto-Resend (see section 10.2). The Receiver Generated SPF database will now start to form while trusted forwarders for individuals are identified.


3. The prevalence of Auto-Resend increases and a near comprehensive Receiver Generated SPF database forms. Almost all ham is authenticated via this nearly comprehensive Receiver Generated SPF database. A strict policy of excluding all non-authenticated unsure email from the inbox can now be adopted.


4. The number of false positives being placed in the junk mail folder is markedly reduced. The junk mail folder is eliminated (see section 10.3). Users no longer have the option of viewing spam.

10.2 At First Only Emails Sent from Auto-Resend Enabled Senders Should Be Bounced
RIA can be implemented with almost no disruption by initially only sending bounces in response to email sent using software updated with Auto-Resend. For example a large amount of spam purports to be sent via Outlook; email sent from an updated Outlook will be aggressively bounced with little concern that the sender will manually interact with the bounce. Spoofed email purporting to have been sent via an updated version of Outlook will, subsequently, be blocked almost completely. Spammers can naturally purport to send their email from software not updated with Auto-Resend, but with the passage of time it will become difficult for spammers to purport that all of their spam is sent via increasingly obsolete or esoteric brands of software.

10.3 Elimination of the Junk Mail Folder
Elimination of the junk mail folder is essential if the sending of spam is to be made futile. People who respond to spam with subjects such as ‘V1@gr@’ or ‘CH3AP S0FTW4RE’ already know the message is spam; the problem is they do not care. Users should not have the option of perusing the junk mail folder; the limitations of current anti-spam efforts make the junk mail folder a necessary evil.

10.4 Spammer Resources Change from the Infinite to the Finite
Completely blocking all non-authenticated spam from the inbox and eliminating the junk mail folder will force spammers to only send authenticated spam. Spammer options to successfully deliver authenticated spam are finite. Zombie networks to falsify Receiver Generated SPF listings are a surprisingly finite resource (see section 7.1).

Spammers can intercept valid sub-addresses (classified as a form of authentication in RIA), but this also is a finite resource. The theoretical maximum number of intercepted sub-addresses in a day obviously cannot exceed the number of valid emails sent in a day. A spammer only gets one attempt to use a stolen sub-address before the sub-address is deactivated.

The traceability of the sub-address will allow for the near instant identification and suspicious-listing of any computer infested with an address harvesting spambot. It would be generous to assume that spammers could harvest even a few million sub-addresses a day in an environment where the source of these leaks can be instantly identified and neutralized.

11 Redundancy Makes RIA Practical, Effective, and Non-Disruptive
11.1 The Benefit of Redundancy

Redundancy within RIA greatly decreases the likelihood that a sender will ever see a bounce.

  • RIA stands on the shoulders of conventional anti-spam technologies; only the small proportion of email classified as ‘unsure’ is affected by RIA.
  • Sub-addresses are non-essential for RIA to operate. Ham sent without a sub-address is still very likely to directly reach the inbox, while it is effectively guaranteed that ham sent with a sub-address will directly reach the inbox (even if the ham was forwarded).
  • The Receiver Generated SPF database is the centerpiece of RIA and it will transparently authenticate nearly all ham, yet sub-addresses, bounces, and Auto-Resend can still authenticate all unsure email even in the total absence of this database.
  • Auto-Resend can very easily be deployed to most of the global email population (see section 5.2), yet any anti-spam system that is dependent on universal deployment of an update to client software is doomed to failure. Auto-Resend is nonessential; it merely improves the transparency of RIA.

 

11.2 Sub-Addresses Employed in a Completely Unique Manner Further Stop Spam and Enhance Ham Delivery

RIA can easily be deployed without sub-addresses as depicted in figure 1A.  Sub-addresses will not accelerate the formation of the Receiver Generated SPF database.  Sub-addresses will, however, help compensate for the failings of SPF such as with email sent via untrusted forwarders.  Without sub-addresses RIA will need to bounce these emails (but only if they are determined to be ‘unsure’ via a statistical filter).

 

Sub-addresses will also help to classify the minute number of authenticated emails that would otherwise have been classified as ‘unsure’.  The sub-address proves that a prior relationship exists between the sender and the recipient.

 

11.2.1 Conventional Sub-Address Based Systems Do Not Decrease Spam

Surprisingly existing sub-address based email systems do not decrease spam because users are unwilling to cut off legitimate contacts by black-listing compromised sub-addresses. Sub-addresses are typically used to white-list email – when compromised the reaction is to apply standard email filtering instead of black-listing. The net effect of this is that sub-addresses, ironically, do not decrease the amount of spam received. The only benefit of sub-addresses is to decrease the false-positive rate of conventional email filters.

Users of existing sub-address systems can only block spam if they are willing to manually black-list every compromised sub-address (this process can alternately be described as white-listing uncompromised sub-addresses and challenging all other mail).16 Such black-listing may appeal to niche users but it is undesirable as a universal anti-spam solution.


 

11.2.2 Sub-Address as Employed by RIA Result in a Profound Decrease in the Spam Burden

RIA manages sub-addresses in a unique and markedly superior manner. Unlike with existing systems sub-addresses are never used to black-list or white-list email. Instead sub-addresses are only used to differentiate between two levels of email filtering.

Conventional filtering can often identify 95% of ham with certainty (filtering with the near universal authentication afforded by RIA will be far more effective – this advantage is addressed later in this section). With RIA this 95% of ham will directly reach the inbox regardless of the presence of a sub-address. The ‘unsure’ remaining 5% will go to the inbox if a sub-address is present, otherwise it will be bounced.

Conversely conventional filtering will often identify 95% of spam with certainty. This 95% of spam will be junked regardless of the presence of a sub-address. Only the ‘unsure’ remaining 5% will be affected by the presence of a sub-address; absence of a sub-address will cause the ‘unsure’ spam to bounce while a valid sub-address will allow the spam to sneak into the inbox.

The automated deactivation of compromised sub-addresses is a crucial and unique aspect of RIA’s use of sub-addresses. Spammers can intercept a valid sub-address but 95% of the time this sub-address will be deactivated before a single piece of spam can be delivered. On this basis alone RIA results a 20 fold decrease in spam seen by the user as compared to a conventional sub-address system where users are willing to manually black-list compromised sub-addresses.

Remember that this 20 fold decrease in spam is modeled on the 95% effectiveness of a conventional filter – in reality the decrease in spam will be far greater than 20 fold as the RIA filter will be greatly enhanced by a near comprehensive Receiver Generated SPF database, Auto-Resend, and the identification of trusted forwarders for individuals.

Auto-deactivation is intolerable for existing sub-address based systems as users are not willing to blindly black-list existing correspondents. Auto-deactivation is preferable for RIA as users are never black-listed but instead are subjected to a slightly more aggressive email filter. With RIA the auto-deactivation of sub-addresses will commonly be considered tolerable if the number of hams bounced is similar to the false-positive rate of traditional filters.


11.3 Few Downsides to Sub-Addresses

Sub-addresses have only a few minor detractions. Non-updated client software that accesses a shared address book may contain duplicated listings for an individual; this will not result in any lost emails but the individual might receive a duplicate mailing. A simple software update will resolve this issue.

Sub-addresses will present a minor cosmetic change in the address seen in the ‘From’ field of incoming email. RIA users will not, however, need to memorize and distribute sub-addresses during face to face meetings or when filling out web forms. Typical RIA users will always manually distribute their conventional email address without needing to append a sub-address to it.

RIA without sub-addresses is acceptable despite the higher bounce rate. The almost total lack substantive downsides to using sub-addresses, however, makes this increase in bounces unnecessary.


 

11.4 RIA Will Benefit Mailing Lists and Web Forms

Other systems that utilize sub-addresses and bounces can be antagonistic to mailing lists and email sent on the basis of an address filled into a web form. RIA will enhance mail delivery from these sources.

RIA only impacts email that is classified as ‘unsure’ so mailing lists that currently never have trouble reaching the inboxes of recipients will almost certainly directly reach the inboxes of RIA users even if sub-addresses are not used – the same is true for businesses that send email based on addresses entered into web forms.

Conventional filtering does not really ‘work’ for ham classified as ‘unsure’ as much of this ham will be placed in spam folder. The aforementioned attributes of RIA guarantee that the percentage of ‘unsure’ emails will be far smaller than the percentage seen with conventional filtering. The number of bounced emails may be no more than the number of emails that would have been junked by conventional email filters; with RIA the senders always have the ability to resend the bounce and achieve a 100% delivery rate while with conventional anti-spam systems there is no recourse when ham is placed in the junk mail folder.

12 Comparison with Challenge/Response
Challenge/Response (C/R) systems are able to authenticate all email. C/R may be the most effective existing method for blocking spam17, but its detractions prevent it from becoming an email standard. C/R, unfortunately, places too much of a burden on senders and delivers too many erroneous bounces to innocent third parties. RIA employs multiple mechanisms to avoid these failings of C/R.

 

12.1 A Model of RIA against C/R and Conventional Filtering
Section 9 detailed how the highly selective bounces sent by RIA would have a relatively insignificant impact on innocent third parties. This section will model and compare the experience of a single small business with a single domain and a single MTA sending unauthenticated ham to users. Assume this small business initiates 10,000 email correspondences to users in each of the following categories:

1. Conventional C/R – The sender must respond to all 10,000 challenges.


2. A selective C/R system that junks all likely spam and only challenges the 5% of ham compromising the initial correspondences from strangers – The sender must respond to 500 challenges.


3. Conventional filter with a 0.5% false positive rate – The sender does not have to deal with any challenges but 50 of the sender’s emails are junked. A significant amount of spam slips by the filter and reaches the recipients.


4. RIA without Auto-Resend – A single manually resent bounce places the sender in the Receiver Generated SPF database; future non-forwarded emails get through as the sender maintains a pristine reputation. Assume that 5% of the emails are forwarded in a way that breaks SPF, and that 50% of these emails do not have a sub-address, and that 10% of these remaining emails are classified as unsure and bounced. The sender must manually respond to 26 challenges.


5. RIA with 80% Auto-Resend implementation – The sender must manually respond to 5 emails.

 

RIA without Auto-Resend will issue 385 times fewer challenges than conventional C/R and 19 times fewer challenges than a highly sophisticated C/R system. A high quality conventional filter will junk about 2 hams for every one ham that RIA will bounce back to the sender.

 

RIA with 80% implementation of Auto-Resend will expose senders to 2000 times fewer challenges than conventional C/R and 100 times fewer challenges than a selective C/R system. A high quality conventional filter will junk about 10 hams for every one bounce that the sender will see.

Different assumptions with different numbers will produce varied results but every common scenario will demonstrate that RIA is orders of magnitude better than C/R. RIA is also vastly superior to conventional filters as it will bounce fewer emails than conventional filters will junk.

12.2 RIA Becomes Unambiguously Superior to Conventional Filtering As the Bounce Rate of RIA Approaches the False Positive Rate of Existing Anti-Spam Services
Senders dislike having to answer challenges but they almost certainly have a far greater dislike of having their emails erroneously placed in a recipient’s spam folder. C/R systems are generally viewed as less preferable to conventional filtering as the number of challenges issued is grossly out of proportion to the number of false positives generated by conventional filtering.

RIA will issue visible challenges at a profoundly lower rate than C/R systems through a unique combination of Receiver Generated SPF, sub-addresses, the identification of trusted forwarders for individuals, and Auto-Resend. The challenge rate of RIA may approach or become less than the false positive rate of existing filters; at this point almost every sender will prefer that their recipients have RIA as opposed to conventional filtering. Senders and RIA users are happier as spam is reduced while actually improving receipt of ham.

13 Conclusion
RIA is practical to implement, does not require outsiders to update software before it works, authenticates all questionable email, and will almost entirely block spam without compromising delivery of ham. Innocent third parties will experience a relatively inconsequential increase in erroneous bounces despite widespread adoption of RIA. Email users, almost without exception, will not have to alter their current practices.


References
Arar, Yardena. Spam Explodes, but You Can Fight Back PC World Accessed March 21, 2007 http://www.webcitation.org/5NWDPTz47


Background Reading [SpamBayes] Accessed March 21, 2007 http://www.webcitation.org/5NWDqOB9W


Brockman, Peter. The Spam Index Report: Comparing Real-World Performance of Anti-Spam Technologies Brockman & Company July, 2007 - available via http://www.brockmann.com/index.php?option=com_content&task=view&id=843&Itemid=69


Dickenson, John (May 6, 2005). Radicati Group Expects E-Mail Client Growth to Continue Messaging Pipeline Accessed March 21, 2007 http://www.webcitation.org/5NWIMHRVH


DK Use Doubles, SPF Not So Much (January 12, 2006) Accessed March 21, 2007 http://www.webcitation.org/5NWD6SISO


Gilles, Tom (2006). Internet Security Trends for 2007 IronPort Systems, Inc. http://www.ironport.com/pdf/ironport_trend_report.pdf


Internet Email Traffic Emergency: Spam “Bounce” Messages are Compromising Networks (April, 2006) Ironport - available via http://www.ironport.com/bouncereport/


IronPort Study on Email Authentication Reveals Significant Adoption (April 19, 2006). Ironport Accessed March 21, 2007 http://www.webcitation.org/5NWCgyfqs


Keizer, Gregg. Spammers’ bot cracks Microsoft CAPTCHA (February 8, 2008) Accessed March 19, 2008 http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9061558 Accessed: 2008-03-19. (Archived by WebCite® at http://www.webcitation.org/5WRX7EA1f)

 

Mail (MX) Server Survey (March 1, 2007) Accessed March 21, 2007 http://www.webcitation.org/5NWIW4GZO


McIsaac, Joe (February 6, 2006). Supplemental addresses (was: Indirection as a useful tool) Online Posting ASRG Forum Archive Accessed March 22, 2007 http://www.webcitation.org/5NXNCsXl1


Scarrow et al. (April 19, 2007). Reputation Building on Authentication Authentication & Online Trust Alliance 2007 Summit Accessed August 14, 2007 http://www.aotalliance.org/summit2007/2007_presents/701_building_on_reputationw.pdf


Taylor, Bradley (July 27-28, 2006). Sender Reputation in a Large Email Service CEAS 2006 Third Conference on Email and AntiSpam  http://www.ceas.cc/2006/19.pdf


Web-based e-mail shares: Hotmail - 35.5%, Yahoo! Mail - 35.1% (August 30th, 2005) Accessed March 21, 2007 http://blogs.zdnet.com/ITFacts/?p=8821


 

 

Footnotes
[1] Bradley Taylor, “Sender Reputation in a Large Email Service,” CEAS 2006 Third Conference on Email and AntiSpam, July 27-28, 2006  http://www.ceas.cc/2006/19.pdf

[2] “IronPort Study on Email Authentication Reveals Significant Adoption,” April 19, 2006 Accessed March 21, 2007 http://www.webcitation.org/5NWCgyfqs

[3] “DK Use Doubles, SPF Not So Much,” January 12, 2006 Accessed March 21, 2007 http://www.webcitation.org/5NWD6SISO

[4] Yardena Arar, “Spam Explodes, but You Can Fight Back,” PC World Accessed March 21, 2007 http://www.webcitation.org/5NWDPTz47

[5] “Background Reading [SpamBayes],” Accessed March 21, 2007 http://www.webcitation.org/5NWDqOB9W

[6] Tom Gilles, “Internet Security Trends for 2007” 2006 IronPort Systems, Inc. pg.8 http://www.ironport.com/pdf/ironport_trend_report.pdf

[7] “Web-based e-mail shares: Hotmail - 35.5%, Yahoo! Mail - 35.1%,” August 30th, 2005 Accessed March 21, 2007 http://blogs.zdnet.com/ITFacts/?p=8821

[8] John Dickenson, “Radicati Group Expects E-Mail Client Growth To Continue,” Messaging Pipeline May 6, 2005 Accessed March 21, 2007 http://www.webcitation.org/5NWIMHRVH

[9] “Mail (MX) Server Survey,” March 1, 2007 Accesses March 21, 2007 http://www.webcitation.org/5NWIW4GZO

[10] Taylor 4.

[11] Gregg Keizer “Spammers' bot cracks Microsoft's CAPTCHA” Acessed March 19, 2008 http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9061558. Accessed: 2008-03-19. (Archived by WebCite® at http://www.webcitation.org/5WRX7EA1f)

[12] Gilles 7.

[13] John Scarrow et al., “Reputation Building on Authentication,” Authentication & Online Trust Alliance 2007 Summit Slide 9 and 13 Accessed August 14, 2007 http://www.aotalliance.org/summit2007/2007_presents/701_building_on_reputationw.pdf

[14] Joe McIsaac, “Supplemental addresses (was: Indirection as a useful tool)” Online Posting February 6, 2006 ASRG Forum Archive Accessed March 22, 2007 http://www.webcitation.org/5NXNCsXl1

[15] “Internet Email Traffic Emergency: Spam “Bounce” Messages are Compromising Networks,” Ironport April, 2006 - available via http://www.ironport.com/bouncereport/

 

[16] McIsaac

[17] Peter Brockman “The Spam Index Report: Comparing Real-World Performance of Anti-Spam Technologies” Brockman & Company July 2007 - available via http://www.brockmann.com/index.php?option=com_content&task=view&id=843&Itemid=69