My favorite part: The best way to use a robots.txt file is to not use it at all. Well... almost. Use it to indicate that robots have full access to all files on your website and to direct robots to your sitemap.xml file. ...
I read the SEOmoz blog post last week too. Its a good read - but still the Robots.txt protocol is used as a guideline, not a rule - web-crawlers don't need to obey them, (but that's considered a little rude!)
The beauty of the Editor is that you can use it however you like. Although it would be a nice feature to have it auto-detect an XML Sitemap! :-D
@Lee I don't think they disobey the directives in the robots.txt. That misconception occurs when people have a disallow for a page or section but it still shows in search results. Google will still index the page but it will not crawl it. It gets the information from inbound links pointing to the page, making the page still seem a good source of information. It may also populate the description from sources like DMOZ if the site submitted to it. The way to prevent it from showing up is to use the nofollow no index. From what I've read anyway.
Some good information about robots.txt
Today I found interesting article about usage of robots.txt
http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
My favorite part:
The best way to use a robots.txt file is to not use it at all. Well... almost. Use it to indicate that robots have full access to all files on your website and to direct robots to your sitemap.xml file. ...
Hope it will be usefull for community or Lee.
Thanks Petr.
I read the SEOmoz blog post last week too. Its a good read - but still the Robots.txt protocol is used as a guideline, not a rule - web-crawlers don't need to obey them, (but that's considered a little rude!)
The beauty of the Editor is that you can use it however you like. Although it would be a nice feature to have it auto-detect an XML Sitemap! :-D
Cheers, Lee.
Just remembered about Cultiv DynamicRobots! Doh!
@Lee I don't think they disobey the directives in the robots.txt. That misconception occurs when people have a disallow for a page or section but it still shows in search results. Google will still index the page but it will not crawl it. It gets the information from inbound links pointing to the page, making the page still seem a good source of information. It may also populate the description from sources like DMOZ if the site submitted to it. The way to prevent it from showing up is to use the nofollow no index. From what I've read anyway.
is working on a reply...