To tell google bot and other bot's that there are some sites they should not crawl you can use a robots.txt, which you can read more about here: http://www.robotstxt.org/robotstxt.html
Ok. How to exclude pages being accessible through more urls I don´t know, unfortunately.
But reason taht they are indexed in Google can only be that your website somewhere exposes them. So I would check the website for a list of generated urls to find out what lists, modules and son on that creates the problematic links.
url rewrite and google
Hi,
how can I stop the google from crawling this path in umbraco?
http://www.florahotelsindia.com/jp/hotel/kochi-flora-airport-hotel/kochi-flora-airport-hotel-dining/
I already use umbracoURLAlias and urlrewrite to modify this url and this is already accesible by using this url
http://www.florahotelsindia.com/jp/kochi-flora-airport-dining.aspx ?
Please help
Hi Sherry
To tell google bot and other bot's that there are some sites they should not crawl you can use a robots.txt, which you can read more about here: http://www.robotstxt.org/robotstxt.html
Or you can use: <meta name="robots" content="noindex,nofollow" />, which you can read more about here: http://googlewebmastercentral.blogspot.com/2007/03/using-robots-meta-tag.html
Hope this is what you're after.
/Jan
Hi,
I would add a canonical tag on every page (in the page master). I.e.:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
See http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
It tells Google what url must be indexed for the given page. In that way you will get only one of the urls indexed.
/Martin
Hi Martin,
I already have those canonicals in my pages. That's why i'm wondering why I still get those errors.
Anyway, How can I exclude this path generated by umbraco?
/ir/hotel/dubai-flora-grand-hotel/photo-gallery/
Hi Sherry,
Ok. How to exclude pages being accessible through more urls I don´t know, unfortunately.
But reason taht they are indexed in Google can only be that your website somewhere exposes them. So I would check the website for a list of generated urls to find out what lists, modules and son on that creates the problematic links.
I use Xenu to crawl websites for links: http://home.snafu.de/tilman/xenulink.html (site looks odd, but it´s ok to download the program :) )
Regards,
Martin
Hi Martin,
Thanks for that. I was able to trim down some of the issues I'm encountering and find those suspected links.
One thing that concerns me is this page
http://www.florahotelsindia.com/ar/kochi-flora-airport-hotel.aspx is giving me 500 error. Which I don't understand since when you try to access this in the browser it loads fine.
In our template I have this code.
if ((Request.Url.ToString() == "http://www.florahotelsindia.com/") || (Request.Url.ToString() == "http://www.florahotelsindia.com/") || (Request.Url.ToString() =="http://florahotelsindia.com/") || (Request.Url.ToString() == "http://florahotelsindia.com"))
{
Response.Redirect("kochi-flora-airport-hotel.aspx");
}
To redirect the other domain to a specific page.
I'm not sure if this is related to the 500 error i'm getting
is working on a reply...
This forum is in read-only mode while we transition to the new forum.
You can continue this topic on the new forum by tapping the "Continue discussion" link below.