Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

  • Petr Snobelt 923 posts 1535 karma points
    Oct 18, 2010 @ 10:41
    Petr Snobelt

    Some good information about robots.txt

    Today I found interesting article about usage of robots.txt

    My favorite part:
    The best way to use a robots.txt file is to not use it at all. Well... almost. Use it to indicate that robots have full access to all files on your website and to direct robots to your sitemap.xml file. ...

    Hope it will be usefull for community or Lee.

  • Lee Kelleher 4020 posts 15796 karma points MVP 13x admin c-trib
    Oct 18, 2010 @ 16:49
    Lee Kelleher

    Thanks Petr.

    I read the SEOmoz blog post last week too.  Its a good read - but still the Robots.txt protocol is used as a guideline, not a rule - web-crawlers don't need to obey them, (but that's considered a little rude!)

    The beauty of the Editor is that you can use it however you like.  Although it would be a nice feature to have it auto-detect an XML Sitemap! :-D

    Cheers, Lee.

  • Lee Kelleher 4020 posts 15796 karma points MVP 13x admin c-trib
    Oct 18, 2010 @ 16:51
    Lee Kelleher

    Just remembered about Cultiv DynamicRobots! Doh!

  • Trevor 2 posts 23 karma points
    Feb 04, 2011 @ 07:52

    @Lee I don't think they disobey the directives in the robots.txt. That misconception occurs when people have a disallow for a page or section but it still shows in search results. Google will still index the page but it will not crawl it. It gets the information from inbound links pointing to the page, making the page still seem a good source of information. It may also populate the description from sources like DMOZ if the site submitted to it. The way to prevent it from showing up is to use the nofollow no index. From what I've read anyway. 

Please Sign in or register to post replies

Write your reply to: