Tech Tip Reduce spam with obfuscation

Author: Gustav Aagesen

Go back to the tutorial page.

The problem

Publishing e-mail addresses on the Internet increases e-mail spam. This is mainly due to spambots farming websites.

The most common workarounds like replacing the e-mail address with an image, not linking e-mail addresses or using (at) or similar rewrites are not considered good.

The solution
  1. Using a the old, but not so widely known, obfuscation technique.
    • Replaces all normal characters with sgml-entities.
    • Keeps the addresses human readable and clickable
  2. Use runtime serverside scripting or Pre-obfuscation through third-party applications (provided below).

The solution Explained

Every character has a sgml-equivalent or a code. The sqlm-equivalents are commonly used to represent special characters like the copyright-symbol © and similar. The code consists of an pre- and postfix, & and the ;. In the case of the copyright symbol the escape is ©.

All characters have numbers assigned to them (in the copyright-case and for other symbols, aliases is available). For instance the @ is @

With this known, we understand that replacing every single character in an e-mail address will create a string six times the lenght of the original address. An address string not easily readable in code but intepreted by the web-browser as ordinary characters.

Typically one would guess that spambots are in a hurry and traverse the web looking @'s (and in some cases mailto:'s). But few of them are tuned in to look for sgml-entities. Mostly this is due to the extra parse-time.

I have provided an explantion of how obfuscate using php and a working obfuscator for you to use.

Method 1 Runtime serverside scripting

For runtime translation of spambot sensitive information you will need to code a function that accepts string input, replaces each character separately with an sgml-equivalent.

Shown below is a php example that will do the necessary work. An $sgml_entities array is needed for it to work.

function obfuscate($_address){
	if(isset($_address)){
		$clean  = trim($_address);
		$obfuscated = '';
		for($i = 0; $i < strlen($clean); $i++){
			$char = substr($clean,$i,1);
			if($sgml_entities[$char]){
				$obfuscated = $obfuscated.'&#'.$sgml_entities[$char].';';
			}else{
				$obfuscated = $obfuscated.$char;
			}
		}
	}
	return $obfuscated;	
}

The following is a list of neccessary sqlm-entities

Method 2 Pre-obfuscation through third-party applications

If you are in a non-code environment you will need to pre-obfuscate the spambot sensitive information.

For e-mail link translations it is important to encode the complete link. Including the mailto: protocol-identifier.

How to use it

  1. Enter your e-mail address below
  2. Take the obfuscated result and add to your website e-mail link

Spambot sensitive text

Conclusion

In this example I have showed how a to reduce spam through obfuscation.

The technique is not considered 100% proof as spambots complexity is advancing. But it the technique is still considered among the best.

Questions or feedback can be set to tormel@gmail.com

Go back to the tutorial page.