Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • n00b 33 posts 134 karma points
    May 10, 2017 @ 09:55
    n00b
    0

    Editing CultivSearchEngineSitemap To exclude pages with different canonicals

    Hi all - newbie to this system so apologies if this in the wrong place!

    FIrst off - precurser to this question, i'm php not .asp and have picked up a client who uses umbraco (7.5.3), so i'm having to pick up a few bits along the way. If you're able to help, please be thorough and assume I know nothing about Umbraco or it's architecture.

    The sites uses CultivSearchEngineSitemap to generate a sitemap xml from pages that are a) published and b) accessible by member access c) isn't protected.

    This is demonstrated in sitemap.chstml:

    foreach (var node in startNode.Children.Where(x => Umbraco.MemberHasAccess(x.Id, x.Path)).Where(x => !Umbraco.IsProtected(x.Id, x.Path)).Where(x => x.IsVisible()))
    {
            if (node.TemplateId > 0)
            {
                <url>
                    <loc>@node.UrlWithDomain()</loc>
                    <lastmod>@(string.Format("{0:s}+00:00", node.UpdateDate))</lastmod>
                    @{
                        var freq = node.GetPropertyValue<string>("SearchEngineSitemapChangeFreq");
                        var pri = node.GetPropertyValue<string>("SearchEngineSitemapPriority");
                    }
    
                    @if (!string.IsNullOrEmpty(freq))
                    {
                        <changefreq>@freq</changefreq>
                    }
                    @if (!string.IsNullOrEmpty(pri))
                    {
                        <priority>@pri</priority>
                    }
                </url>
            }
    

    I have a global field across all pages called CanonicalTag which is a text field.

    Ideally, i'd like to have a comparison between the page's URL and the canonical tag and exclude any pages that have a canonicalTag entry that is different to the page URL.

    eg:

    .Where(CanonicalTag = Null OR CanonicalTag = Url)

    The result should exclude any page that has a canonical tag field that is different to the page's url or just hasn't been filled out. (our SEO tool reports that pages in a sitemap that have a different canonical are bad for seo).

    If anyone could offer a nice snippet of code to solve this, that would be great as i'm not sure of the syntax or how to reference these properly.

    Thanks in advance.

  • David Peck 690 posts 1896 karma points c-trib
    May 10, 2017 @ 12:59
    David Peck
    100

    I'm afraid I've not tested it but I suggest it is probably as simple as

    foreach (var node in startNode.Children
            .Where(x => Umbraco.MemberHasAccess(x.Id, x.Path))
            .Where(x => !Umbraco.IsProtected(x.Id, x.Path))
            .Where(x => x.IsVisible())
            //Two new where clauses
            .Where(x => x.HasProperty("canonicalTag"))
            .Where(x => string.IsNullOrEmpty(x.GetPropertyValue<string>("canonicalTag")) == false))
    {
        if (node.TemplateId > 0)
        {
            <url>
                <loc>@node.UrlWithDomain()</loc>
                <lastmod>@(string.Format("{0:s}+00:00", node.UpdateDate))</lastmod>
                @{
                    var freq = node.GetPropertyValue<string>("SearchEngineSitemapChangeFreq");
                    var pri = node.GetPropertyValue<string>("SearchEngineSitemapPriority");
                }
    
                @if (!string.IsNullOrEmpty(freq))
                {
                    <changefreq>@freq</changefreq>
                }
                @if (!string.IsNullOrEmpty(pri))
                {
                    <priority>@pri</priority>
                }
            </url>
        }
    

    Notice just the extra two Where clauses, as the rest is as supplied.

    I'm curious though at the need to require a canonicalTag. If you'd prefer to render the canonicalTag if provided, and if not the use the Url, then you can just output the Url as: @Umbraco.Coalesce(node.GetPropertyValue("canonicalTag"), node.UrlWithDomain())

    Give up the PHP and join the ASP.NET world. It will bring sex, drugs and rock and roll!

  • n00b 33 posts 134 karma points
    May 10, 2017 @ 14:32
    n00b
    0

    Hi David.

    Thanks for your response. The issue has arisen due to the number of products that have been added (duplicated) as children of different categories across the site. The issue is that an seo crawl flags any pages in the xml sitemap as incorrect if it then visits it and sees it has a canonical tag pointing to another page.

    I think the issue is that search engines don't see the point in indexing pages that aren't original. While it's great that this plugin creates a sitemap automatically, it's a pain in the rear that there are no options for it.

    One of the things I added to the site was a CanonicalTag field to each page which simply allowed us to add .. you guessed it.. a canonical tag to pages that had been duplicated across the site.

    This seems to me the most logical way of of ensuring pages that aren't original are excluded from the sitemap.

    Maybe there's a better way?

  • n00b 33 posts 134 karma points
    May 11, 2017 @ 08:52
    n00b
    0

    With a little modification, this seems to work (ish). This could be that not all the canonical tags have been added yet, so will report back on that!

    However, i need to add a condition that excludes pages from inclusion based on words within the url.

    eg.

    .where node.UrlWithDomain() doesNotcontain "thanks" OR "terms" OR "policy"

    Just a little syntax help would be perfect!

    Considering the darkside (seems a much easier language tbh, just need a few months to learn the syntax and umbraco variable names )

Please Sign in or register to post replies

Write your reply to:

Draft