Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Sander Houttekier 114 posts 163 karma points
    Sep 26, 2011 @ 16:37
    Sander Houttekier
    0

    full path disallow

    great package, and just what i need... in a multi site environement this is very usefull really,

    it looks usefull in the sitemap urls like your example, but how would you handle full urls for allow or disallow

    the robots.txt has relative paths  like  disallow /umbraco/

     

    but in the following explanation i would like to disallow everything requested on a specific url,
    say, you have a site for a company in europe, you setup the corporate site, and are now working on 3 country sites
    belgian, netherlands and luxembourg sites are still in development (content beeing finalized and all)
    can this package handle url exclusion?

    the client prefers to publish everything already, but use a staging URL,
    i know thats possible, but because its all published, we need to exclude the URLs from the robots.txt otherwise google will start indexing temporary pages which is not really a good idea.

    so, is there a way to say 
    disallow: http://dev.mysite.be/

    best regards

    Sander Houttekier

  • Sebastiaan Janssen 5045 posts 15476 karma points MVP admin hq
    Sep 26, 2011 @ 16:57
    Sebastiaan Janssen
    0

    You would do that like so:

    User-agent: *
    Disallow: /

    But it's not exactly "safe", what we usually do is put windows authentication up on the temporary site so that no links to the site can ever appear in Google.

  • Sander Houttekier 114 posts 163 karma points
    Sep 26, 2011 @ 17:02
    Sander Houttekier
    0

    ah but there is the problem, i can only use this technique if the staging sites are in a different umbraco

    i knew this was possible, however due to budget related descisions, the client has only 1 umbraco

    with multiple sites in it. some of them already have live hostnames, others are still being implemented content wise.

     

    the 1 robots.txt cannot handle disallow: http://mysite.com/
    but by placing disallow: / would also disallow it for every site in that umbraco. incluiding the ones that are live. 

  • Sebastiaan Janssen 5045 posts 15476 karma points MVP admin hq
    Sep 26, 2011 @ 17:05
    Sebastiaan Janssen
    0

    Ah I see, well in that case: I should release the source so you can make your own hack in it to allow for this... :) I'll try to do so this evening!

  • Sander Houttekier 114 posts 163 karma points
    Sep 26, 2011 @ 17:30
    Sander Houttekier
    0

    that would be great

     i know the situation is not optimal, and if the money was there we would have two umbraco's and a courrier connection inbetween for staging versus live.

    as explained that has been an issue with this project.

    i've been thinking about adding urlrewriting for those specific urls too. as an other approach to the issue. redirecting towards another robots.txt with only the disallow:/ in it. but so far that has been unsuccessfull. thats why i went looking for your package :) 

  • Sebastiaan Janssen 5045 posts 15476 karma points MVP admin hq
    Sep 27, 2011 @ 09:51
    Sebastiaan Janssen
    0

    Well, it turns out the source for this is extremely simple, so what you'd need to do I think is based on the HTTP_HOST, write only the disallow rule and not anything else. Not sure how you would make this configurable, but you could probably just hardcode the domain(s) in for now: 

    using System.IO;
    using System.Web;
    
    namespace Cultiv.DynamicRobots
    {
        public class RobotsTxt : IHttpHandler
        {
            public void ProcessRequest(HttpContext context)
            {
                context.Response.ContentType = "text/plain";
    
                var robotsTemplate = HttpContext.Current.Server.MapPath(VirtualPathUtility.ToAbsolute("~/robots.txt"));
    
                if (File.Exists(robotsTemplate))
                {
                    var streamReader = File.OpenText(robotsTemplate);
                    var input = streamReader.ReadToEnd();
                    context.Response.Write(input.Replace("{HTTP_HOST}", HttpContext.Current.Request.ServerVariables["HTTP_HOST"]));
                    streamReader.Close();
                    streamReader.Dispose();
                }
                else
                {
                    context.Response.Write("");
                }
            }
    
            public bool IsReusable
            {
                get { return true; }
            }
        }
    }
Please Sign in or register to post replies

Write your reply to:

Draft