Author: Gustav Aagesen
Go back to the tutorial page.
Every character has a sgml-equivalent or a code. The sqlm-equivalents are commonly used to represent special characters like the copyright-symbol © and similar. The code consists of an pre- and postfix, & and the ;. In the case of the copyright symbol the escape is ©.
All characters have numbers assigned to them (in the copyright-case and for other symbols, aliases is available). For instance the @ is @
With this known, we understand that replacing every single character in an e-mail address will create a string six times the lenght of the original address. An address string not easily readable in code but intepreted by the web-browser as ordinary characters.
Typically one would guess that spambots are in a hurry and traverse the web looking @'s (and in some cases mailto:'s). But few of them are tuned in to look for sgml-entities. Mostly this is due to the extra parse-time.
I have provided an explantion of how obfuscate using php and a working obfuscator for you to use.
For runtime translation of spambot sensitive information you will need to code a function that accepts string input, replaces each character separately with an sgml-equivalent.
Shown below is a php example that will do the necessary work. An $sgml_entities array is needed for it to work.
function obfuscate($_address){ if(isset($_address)){ $clean = trim($_address); $obfuscated = ''; for($i = 0; $i < strlen($clean); $i++){ $char = substr($clean,$i,1); if($sgml_entities[$char]){ $obfuscated = $obfuscated.''.$sgml_entities[$char].';'; }else{ $obfuscated = $obfuscated.$char; } } } return $obfuscated; }
The following is a list of neccessary sqlm-entities
If you are in a non-code environment you will need to pre-obfuscate the spambot sensitive information.
For e-mail link translations it is important to encode the complete link. Including the mailto: protocol-identifier.
In this example I have showed how a to reduce spam through obfuscation.
The technique is not considered 100% proof as spambots complexity is advancing. But it the technique is still considered among the best.
Questions or feedback can be set to tormel@gmail.com
Go back to the tutorial page.