Robots.txt: Do All Bots Comply with Exclusion Rules?

July 26, 2012 / Search Engines

Unfortunately, not all search bots and spiders comply with robots exclusion rules; nor do they have to either. While we’re not lawyers (and we could be wrong), as far as we’re aware, there is no U.S. law prohibiting search engines from ignoring robots.txt exclusions on websites. However, that doesn’t mean there’s no point in using them; as most of the major search engines comply with the robots.txt exclusions, including Google and Bing/Yahoo!

What search engines do not comply with robots.txt exclusions?

We have suspicion to believe Baidu, a popular search engine in China, does not comply with robots.txt exclusions. However, some smaller search engines worldwide may also not comply with robots.txt exclusions.

How do I prevent non-abiding search engines from crawling my website or a specific folder?

The only plausible way to prevent is likely to blacklist their IPs on your Dedicated Server or your individual hosting account. You may also need to blacklist an IP range in order for it to be more effective. Otherwise, password-protect the specific directory.

