I’ve discussed comment spamming previously – automatically generated comments that are posted to blogs in an attempt to game the Google search rankings.
The content of the comments is usually in the form of a generic compliment, in a naive attempt to stroke the ego of the blog owner and have the comment remain in place. The true “payload” of the spam is a link to a website selling products, with the spammer hoping that the link stays in place until Google finds it, which will enhance the search engine ranking of the linked website.
Most of the comment spam posted today is posted by automated software to thousands of websites, hence the content of each message is easily found on Google. Here is one example of the software in use: I’m not going to link to the dirtbags because it will improve their Google ranking, but this slimeball is currently the top Google hit for “wordpress comment spammer“.
The software is designed to allow the user to evade spam protection on blogs; with one of the available options enabling the use of proxy servers to distribute the source of the spam across multiple IP addresses, instead of the single address belonging to the spammer. The content itself is also randomised, the input being customisable list files that allow each comment to have a different author, email address and comment content.
It also seems that some software is smart enough to vary the words used inside the sentences it generates, which is all well and good until it fails, such as the example comments below.
At first glance it appears that the spammer’s software does not support the use of merge fields inside HTML <b> tags, as the bold text is a syntactical mess, while the normally formatted text below differs slightly between the two comments, showing the merging of words was successful.
A second way that spammers customise their automated comments is by including the article title inside the comment. In most cases this looks ridiculous as the software blindly parses the content of the HTML <title> tags of the page, which includes the website name: for example the title of this page is “What’s behind WordPress comment spam? | Waking up in Geelong”.
However there is one even more obvious bug in a piece of spamming software, which is seen when URLs redirect to another page instead of displaying an article. An example of this failure can be seen in the comment posted on my article MTR East Rail Line: an intro: originally located at this domain, it was moved to my Hong Kong-focused website Checkerboard Hill a few months ago. Due to the use of a HTTP 301 redirect I created to allow visitors to find the article at the new address, the software spammer has been tricked by the page title, believing the page is titled “301 Moved Permanently”.
Despite these glitches, I would assume spammers aren’t particularly concerned about the bugs in their software, because as long as their deluge of useless comments continues to flood the web and search engines continue to be tricked, they are achieving their goal.
- Spam in blogs: an introduction to the comment spam at Wikipedia.