Skip to content

How to retaliate against the Yandex bot (Updated)

ooga booga

A few days ago I noticed that my site’s bandwidth usage was suddenly up. And I mean way up. Bandwidth is expensive, so I dug into the server logs and found that one particular computer was repeatedly accessing every page on my domain, several times a day. Further research revealed that the culprit is a bot that indexes web pages for a Russian search engine called Yandex.
My attempts to rebuff the Yandex bot using the familiar robots.txt method failed utterly. Yandex bots ignore that file, which causes no small amount of stomach acid online among people like me who don’t have money to burn.
I decided to retaliate.

I added the following lines to my .htaccess file, so that every time a bot whose name begins with Yandex tries to access my site it gets a 403 error instead of downloading the page it’s trying to see.

# block known trouble makers dumb enough to
# announce who they are
SetEnvIfNoCase User-Agent “^Yandex” bad_bot
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot

The bandwidth dropped back down to where it used to be, but I noticed one stupid Yandex bot kept coming back from IP address even when I fed it a never-ending stream of 403 errors. Since every static page on my site ends in .htm and only my 403 error page ends in .shtml, I got nasty by adding these lines to my .htaccess file to target all visitors from who try to access a page ending in .shtml:

# permanently redirect specific IP request for entire site
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REMOTE_HOST} 77\.88\.26\.27
RewriteRule \.shtml$ [R=301,L]

That Yandex bot now gets rickrolled every time it tries to index my site. Problem solved.

6/30 Update: Looks like the Yandex bots have gone away. My server logs show zero hits from that domain. Now witness the firepower of this fully ARMED and OPERATIONAL battle station!
firepower -- screw subtlelty
#^@&ing spammers.

8/13 Update: Even better!