Editing CultivSearchEngineSitemap To exclude pages with different canonicals
Hi all - newbie to this system so apologies if this in the wrong place!
FIrst off - precurser to this question, i'm php not .asp and have picked up a client who uses umbraco (7.5.3), so i'm having to pick up a few bits along the way. If you're able to help, please be thorough and assume I know nothing about Umbraco or it's architecture.
The sites uses CultivSearchEngineSitemap to generate a sitemap xml from pages that are a) published and b) accessible by member access c) isn't protected.
This is demonstrated in sitemap.chstml:
foreach (var node in startNode.Children.Where(x => Umbraco.MemberHasAccess(x.Id, x.Path)).Where(x => !Umbraco.IsProtected(x.Id, x.Path)).Where(x => x.IsVisible()))
{
if (node.TemplateId > 0)
{
<url>
<loc>@node.UrlWithDomain()</loc>
<lastmod>@(string.Format("{0:s}+00:00", node.UpdateDate))</lastmod>
@{
var freq = node.GetPropertyValue<string>("SearchEngineSitemapChangeFreq");
var pri = node.GetPropertyValue<string>("SearchEngineSitemapPriority");
}
@if (!string.IsNullOrEmpty(freq))
{
<changefreq>@freq</changefreq>
}
@if (!string.IsNullOrEmpty(pri))
{
<priority>@pri</priority>
}
</url>
}
I have a global field across all pages called CanonicalTag which is a text field.
Ideally, i'd like to have a comparison between the page's URL and the canonical tag and exclude any pages that have a canonicalTag entry that is different to the page URL.
eg:
.Where(CanonicalTag = Null OR CanonicalTag = Url)
The result should exclude any page that has a canonical tag field that is different to the page's url or just hasn't been filled out. (our SEO tool reports that pages in a sitemap that have a different canonical are bad for seo).
If anyone could offer a nice snippet of code to solve this, that would be great as i'm not sure of the syntax or how to reference these properly.
I'm afraid I've not tested it but I suggest it is probably as simple as
foreach (var node in startNode.Children
.Where(x => Umbraco.MemberHasAccess(x.Id, x.Path))
.Where(x => !Umbraco.IsProtected(x.Id, x.Path))
.Where(x => x.IsVisible())
//Two new where clauses
.Where(x => x.HasProperty("canonicalTag"))
.Where(x => string.IsNullOrEmpty(x.GetPropertyValue<string>("canonicalTag")) == false))
{
if (node.TemplateId > 0)
{
<url>
<loc>@node.UrlWithDomain()</loc>
<lastmod>@(string.Format("{0:s}+00:00", node.UpdateDate))</lastmod>
@{
var freq = node.GetPropertyValue<string>("SearchEngineSitemapChangeFreq");
var pri = node.GetPropertyValue<string>("SearchEngineSitemapPriority");
}
@if (!string.IsNullOrEmpty(freq))
{
<changefreq>@freq</changefreq>
}
@if (!string.IsNullOrEmpty(pri))
{
<priority>@pri</priority>
}
</url>
}
Notice just the extra two Where clauses, as the rest is as supplied.
I'm curious though at the need to require a canonicalTag. If you'd prefer to render the canonicalTag if provided, and if not the use the Url, then you can just output the Url as: @Umbraco.Coalesce(node.GetPropertyValue("canonicalTag"), node.UrlWithDomain())
Give up the PHP and join the ASP.NET world. It will bring sex, drugs and rock and roll!
Thanks for your response.
The issue has arisen due to the number of products that have been added (duplicated) as children of different categories across the site. The issue is that an seo crawl flags any pages in the xml sitemap as incorrect if it then visits it and sees it has a canonical tag pointing to another page.
I think the issue is that search engines don't see the point in indexing pages that aren't original. While it's great that this plugin creates a sitemap automatically, it's a pain in the rear that there are no options for it.
One of the things I added to the site was a CanonicalTag field to each page which simply allowed us to add .. you guessed it.. a canonical tag to pages that had been duplicated across the site.
This seems to me the most logical way of of ensuring pages that aren't original are excluded from the sitemap.
Editing CultivSearchEngineSitemap To exclude pages with different canonicals
Hi all - newbie to this system so apologies if this in the wrong place!
FIrst off - precurser to this question, i'm php not .asp and have picked up a client who uses umbraco (7.5.3), so i'm having to pick up a few bits along the way. If you're able to help, please be thorough and assume I know nothing about Umbraco or it's architecture.
The sites uses CultivSearchEngineSitemap to generate a sitemap xml from pages that are a) published and b) accessible by member access c) isn't protected.
This is demonstrated in sitemap.chstml:
I have a global field across all pages called CanonicalTag which is a text field.
Ideally, i'd like to have a comparison between the page's URL and the canonical tag and exclude any pages that have a canonicalTag entry that is different to the page URL.
eg:
.Where(CanonicalTag = Null OR CanonicalTag = Url)
The result should exclude any page that has a canonical tag field that is different to the page's url or just hasn't been filled out. (our SEO tool reports that pages in a sitemap that have a different canonical are bad for seo).
If anyone could offer a nice snippet of code to solve this, that would be great as i'm not sure of the syntax or how to reference these properly.
Thanks in advance.
I'm afraid I've not tested it but I suggest it is probably as simple as
Notice just the extra two
Where
clauses, as the rest is as supplied.I'm curious though at the need to require a canonicalTag. If you'd prefer to render the canonicalTag if provided, and if not the use the Url, then you can just output the Url as:
@Umbraco.Coalesce(node.GetPropertyValue("canonicalTag"), node.UrlWithDomain())
Give up the PHP and join the ASP.NET world. It will bring sex, drugs and rock and roll!
Hi David.
Thanks for your response. The issue has arisen due to the number of products that have been added (duplicated) as children of different categories across the site. The issue is that an seo crawl flags any pages in the xml sitemap as incorrect if it then visits it and sees it has a canonical tag pointing to another page.
I think the issue is that search engines don't see the point in indexing pages that aren't original. While it's great that this plugin creates a sitemap automatically, it's a pain in the rear that there are no options for it.
One of the things I added to the site was a CanonicalTag field to each page which simply allowed us to add .. you guessed it.. a canonical tag to pages that had been duplicated across the site.
This seems to me the most logical way of of ensuring pages that aren't original are excluded from the sitemap.
Maybe there's a better way?
With a little modification, this seems to work (ish). This could be that not all the canonical tags have been added yet, so will report back on that!
However, i need to add a condition that excludes pages from inclusion based on words within the url.
eg.
.where node.UrlWithDomain() doesNotcontain "thanks" OR "terms" OR "policy"
Just a little syntax help would be perfect!
Considering the darkside (seems a much easier language tbh, just need a few months to learn the syntax and umbraco variable names )
is working on a reply...