Go Back   Cloud Computing > General Discussion > General Discussion Forum
 

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 05-30-11, 03:57
BOD Member
 
Join Date: May 2011
Posts: 47
Default What is Robots.txt?

Hi!
I want to know that What is Robots.txt? Please give your suggestion about it.
Reply With Quote
  #2 (permalink)  
Old 05-30-11, 04:35
BOD Member
 
Join Date: Dec 2010
Posts: 227
Default

A robots.txt file is simply a text file that resides in your web site’s root directory, and it dictates what type of visitors can view your web site. This file is often utilized to block search engines or another robots from crawling the site and indexing the information. This file can also be utilized to block specific visitors based on their IP address. For example, if somebody is abusing your site, you may want to block their IP address from accessing the site utilizing the robots.txt file.

Some hosting accounts don't add a robots.txt file automatically, while another leave the file blank by default. The exact configuration of the file will depend upon the web hosting provider. It is always best to check if the robots.txt file is edited to disallow SE, as this could be very detrimental to your web site’s success.
Reply With Quote
  #3 (permalink)  
Old 05-30-11, 05:12
BOD Member
 
Join Date: May 2011
Posts: 47
Default

What are the advantages of Robot.txt?
Reply With Quote
  #4 (permalink)  
Old 05-30-11, 06:28
Moderator
 
Join Date: Nov 2010
Posts: 476
Default

Just to let you know that a robots.txt file only tells well behaved spiders not to index (i.e exclude) parts of your website. The robots.txt file doesn't permit any form of security or protection. Many spiders out there completely ignore the robots.txt file and will index everything.
Reply With Quote
  #5 (permalink)  
Old 05-30-11, 06:43
BOD Member
 
Join Date: Dec 2010
Posts: 227
Default

Yes, I agree with you.
There are some spammy robots that don't respect robots.txt instructions..if you have blocked them by robots.txt., they will still follow the whole pages of your site.

In this case you can track their IP addresses by your traffic log and then block them utilizing .htaccess... It will also assist you to cut down your bandwidth usage as half of your website bandwidth is utilized by these spammy bots if you not block them.
Reply With Quote
  #6 (permalink)  
Old 06-08-11, 06:21
BOD Member
 
Join Date: May 2011
Posts: 8
Default

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.In those cases, when you have a complex robots.txt file – i.e. you give different instructions to different user agents.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off
Forum Jump


All times are GMT -6. The time now is 15:26.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
Copyright © 1999-2012, BODHost Ltd. All rights reserved.