full path disallow

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Sander Houttekier 114 posts 163 karma points

Sep 26, 2011 @ 16:37

0

full path disallow

great package, and just what i need... in a multi site environement this is very usefull really,

it looks usefull in the sitemap urls like your example, but how would you handle full urls for allow or disallow

the robots.txt has relative paths like disallow /umbraco/

but in the following explanation i would like to disallow everything requested on a specific url,
say, you have a site for a company in europe, you setup the corporate site, and are now working on 3 country sites
belgian, netherlands and luxembourg sites are still in development (content beeing finalized and all)
can this package handle url exclusion?

the client prefers to publish everything already, but use a staging URL,
i know thats possible, but because its all published, we need to exclude the URLs from the robots.txt otherwise google will start indexing temporary pages which is not really a good idea.

so, is there a way to say
disallow: http://dev.mysite.be/

best regards

Sander Houttekier

Copy Link
Sebastiaan Janssen 5061 posts 15544 karma points MVP admin hq

Sep 26, 2011 @ 16:57

0

You would do that like so:

User-agent: *
Disallow: /

But it's not exactly "safe", what we usually do is put windows authentication up on the temporary site so that no links to the site can ever appear in Google.

Copy Link
Sander Houttekier 114 posts 163 karma points

Sep 26, 2011 @ 17:02

0

ah but there is the problem, i can only use this technique if the staging sites are in a different umbraco

i knew this was possible, however due to budget related descisions, the client has only 1 umbraco

with multiple sites in it. some of them already have live hostnames, others are still being implemented content wise.

the 1 robots.txt cannot handle disallow: http://mysite.com/
but by placing disallow: / would also disallow it for every site in that umbraco. incluiding the ones that are live.

Copy Link
Sebastiaan Janssen 5061 posts 15544 karma points MVP admin hq

Sep 26, 2011 @ 17:05

0

Ah I see, well in that case: I should release the source so you can make your own hack in it to allow for this... :) I'll try to do so this evening!

Copy Link
Sander Houttekier 114 posts 163 karma points

Sep 26, 2011 @ 17:30

0

that would be great

i know the situation is not optimal, and if the money was there we would have two umbraco's and a courrier connection inbetween for staging versus live.

as explained that has been an issue with this project.

i've been thinking about adding urlrewriting for those specific urls too. as an other approach to the issue. redirecting towards another robots.txt with only the disallow:/ in it. but so far that has been unsuccessfull. thats why i went looking for your package :)

Copy Link

Sebastiaan Janssen 5061 posts 15544 karma points MVP admin hq

Sep 27, 2011 @ 09:51

Well, it turns out the source for this is extremely simple, so what you'd need to do I think is based on the HTTP_HOST, write only the disallow rule and not anything else. Not sure how you would make this configurable, but you could probably just hardcode the domain(s) in for now:

using System.IO;
using System.Web;

namespace Cultiv.DynamicRobots
{
    public class RobotsTxt : IHttpHandler
    {
        public void ProcessRequest(HttpContext context)
        {
            context.Response.ContentType = "text/plain";

            var robotsTemplate = HttpContext.Current.Server.MapPath(VirtualPathUtility.ToAbsolute("~/robots.txt"));

            if (File.Exists(robotsTemplate))
            {
                var streamReader = File.OpenText(robotsTemplate);
                var input = streamReader.ReadToEnd();
                context.Response.Write(input.Replace("{HTTP_HOST}", HttpContext.Current.Request.ServerVariables["HTTP_HOST"]));
                streamReader.Close();
                streamReader.Dispose();
            }
            else
            {
                context.Response.Write("");
            }
        }

        public bool IsReusable
        {
            get { return true; }
        }
    }
}

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies