How to Block Bad Bots?

July 10, 2013 / General Discussion

Before you proceed to block bad bots, you will need to know at least one of two things: IP addresses from where the bots are coming or the User Agent String which bots are using. The easiest way to find this is to look into your raw web logs or AWstats.

For your information, the IP address is a series of 4 numbers separated by dots. They look like this “127.0.0.1”

The User-Agent String is just the name that the program accessing your website goes by. For example, version 9.51 of the Opera web browser has a user agent string of Opera/9.51 (Windows NT 5.1; U; en) among others. While the Google search engine bots go by Googlebot/2.1.

You will not need to know the entire user-agent strings. Just find some part of the user agent string which is unique to that specific bot and no other bot or web browser uses.

Note down the IP addresses used by the bots and the user agent string. Once you have noted down the IP addresses used by the bots. And the user agent string, you can follow either any one of the following step or you can use both steps. Using both steps recommended for better security.

  1. Blocking Bad Bots by using IP

    You can block bots by using IP addresses. Just enter the following code under your .htaccess file.

    Code:
    Order Deny, Allow
    Deny from IP address

    But if you have more than one IP addresses to block. Just add another “Deny from” line with that IP address underneath.

    Code:
    Order Deny, Allow
    Deny from IP address
    Deny from IP address

    For example:
    root@support[~]# vi .htaccess
    Deny an IP address.
    Order allow, deny
    Deny from 79.133.196.50 Specific IP address
    Deny from 79.133.196.50/32 Subnet range
    Deny from 196.50.* Wildcard IP address
    Allow from all

    You also block a country by using the mod_geoip module which compiled into Apache on the build.

    Edit your .htaccess file
    GeoIP ON
    SetEnvIF GEOIP_COUNTRY_CODE CN BlockThese
    SetEnvIF GEOIP_COUNTRY_CODE TR BlockThese
    # Add more countries here
    Deny from env=BlockThese

    A full list of 2 digit country codes can be found here:
    http://www.iso.org/iso/country_codes/iso_3166_code_lists/country_names_and_code_elements

    The country code mentioned in the above example are CN = China and TR = Turkey

    Note: There is no representation of how your website will perform if you add blocks to your website. Keep in mind that the larger the .htaccess file. The slower your website loading will be as it has to process files on each loading.

    IP numbers change and or added to certain country ranges. You will need to remember to update the list accordingly.

  2. Blocking the bad bots by using User Agent String:

    You can place the following code at the bottom of your .htaccess file. If you do not already have a file called .htaccess in your website’s root directory. Then you can create a new one.

    Code:
    #get rid of the bad bot
    RewriteEngine on
    RewriteCond %{ HTTP_USER_AGENT} ^BadBot
    RewriteRule ^ (.*) $ http://go.away/

    The above lines tell your web server to check for any bot whose user-agent string starts with “BadBot”.

    To block more than one badbot, follow the below steps:

    Code:
    #get rid of bad bots
    RewriteEngine on
    RewriteCond %{ HTTP_USER_AGENT} ^BadBot [OR]
    RewriteCond %{ HTTP_USER_AGENT} ^EvilScraper [OR]
    RewriteCond %{ HTTP_USER_AGENT} ^FakeUser
    RewriteRule ^ (.*) $ http://go.away/

    Note: You can replace “BadBot” “EvilScraper” and “FakeUser” with the User Agent String that you find in logs. If you implement the method described above, you will be able to block specific bad bots from accessing your website by either their IP addresses or their User Agent String.