Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Mitch 44 posts 159 karma points
    Jul 19, 2017 @ 10:38
    Mitch
    0

    UmbracoExamine.PDF indexing published Media

    Hi guys

    I'm using UmbracoExamine.PDF to index PDFs, but I'm now aware that it indexes PDFs in the Media section whether they've been published on a content page or not.

    Is there a way I can index only the PDFs that have been published on a content node?

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 19, 2017 @ 11:04
    Ismail Mayat
    1

    Mitch,

    So you could tap into gathering node event then somehow figure out if the PDF is being used on a content node that is published. If content node not published then cancel the indexing event.

    You may be able to determine media usage using nexu package https://our.umbraco.org/projects/backoffice-extensions/nexu/

    Regards

    Ismail

  • Mitch 44 posts 159 karma points
    Jul 20, 2017 @ 09:29
    Mitch
    0

    Thanks Ismail. Was hoping there would be a simple config setting I could use. I'll check out that Nexu package. Do you know if it has an API I can code to?

    Thanks

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 20, 2017 @ 09:57
    Ismail Mayat
    101

    Mitch,

    It does. Its actually amazeballs. So you install the package. Then you rebuild the usage via dashboard. What that does is for most of your common data types like media picker rte it will find usages of content and media then it will fill relations table. The api uses relations api to show you usage. If you have custom data types then you will need to build resolvers for them but its all documented.

    So in theory after installing nexu then rebuilding you can using gathering node event on pdf index you can via id of current media item you are indexing do a lookup. See https://github.com/dawoe/umbraco-nexu/blob/develop/Source/Our.Umbraco.Nexu.Core.Tests/NexuApiControllerTests.cs there is api of sorts but worse case you can use relations api as the package will create relations in the db.

    I have not done this before but reckon with a bit of work you can solve your issue.

    Regards

    Ismail

  • Dave Woestenborghs 3504 posts 12135 karma points MVP 9x admin c-trib
    Jul 20, 2017 @ 10:00
    Dave Woestenborghs
    0

    Hi Mitch,

    If you can't use my api directly you can always code against the normal Umbraco Relations API. So like Ismail set you can use it probably for you case.

    Dave

  • Mitch 44 posts 159 karma points
    Jul 20, 2017 @ 10:20
    Mitch
    0

    Thank you so much guys. Very useful advice. I'll give it a go right now.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 20, 2017 @ 10:44
    Ismail Mayat
    0

    Mitch,

    Let us know how you get on because this is very interesting use case and if it all works i am going to include note on it on the examine course i wrote under the pdf indexing exercise.

    Regards

    Ismail

  • Dave Woestenborghs 3504 posts 12135 karma points MVP 9x admin c-trib
    Jul 20, 2017 @ 10:46
    Dave Woestenborghs
    1

    I also have plans to extend the API with methods that can be used in your project just for this kinds of use cases.

    But need to fix some other things first..;and of course find time :-)

    Dave

  • Mitch 44 posts 159 karma points
    Jul 20, 2017 @ 16:03
    Mitch
    1

    So, I had a stab at it and it seems to work! Not much code and I'm sure it can be improved, but here it is...

    public class PdfExamineEvents : ApplicationEventHandler
    {
        private IRelationService _relationService;
    
        protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
        {
            var helper = new UmbracoHelper(UmbracoContext.Current);
    
            ExamineManager.Instance.IndexProviderCollection["PDFIndexer"].NodeIndexing += (sender, e) => NodeIndexing(sender, e, helper);
    
            _relationService = applicationContext.Services.RelationService;
        }
    
        private void NodeIndexing(object sender, IndexingNodeEventArgs args, UmbracoHelper helper)
        {
            args.Cancel = !ShouldIndex(args.NodeId, helper);
        }
    
        private bool ShouldIndex(int nodeId, UmbracoHelper helper)
        {
            // Check if media item is PDF
            if (!IsPdf(nodeId, helper)) return false;
    
            // Check if if the PDF is in the Relations
            if (!_relationService.IsRelated(nodeId)) return false;
    
            // If any document with this PDF is published, add to index. If not, cancel
            var relations = _relationService.GetByChildId(nodeId);
            if (!AnyPublished(relations.Select(r => r.ParentId), helper)) return false;
    
            return true;
        }
    
        private static bool IsPdf(int nodeId, UmbracoHelper helper)
        {
            var mediaItem = helper.TypedMedia(nodeId);
    
            // Not sure if every IPublishedContent has this property
            return mediaItem.HasProperty("umbracoExtension") && mediaItem.GetPropertyValue<string>("umbracoExtension") == "pdf";
        }
    
        private static bool AnyPublished(IEnumerable<int> nodeIds, UmbracoHelper helper)
        {
            if (nodeIds.Select(id => helper.TypedContent(id)).Any(n => n != null)) return true;
            return false;
        }
    }
    
  • Dave Woestenborghs 3504 posts 12135 karma points MVP 9x admin c-trib
    Jul 20, 2017 @ 16:07
    Dave Woestenborghs
    0

    Hi Mitch,

    Nice to see that it works.

    Dave

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 20, 2017 @ 16:23
    Ismail Mayat
    0

    Mitch,

    One improvement you could make is test for pdf extension using the args provided then you dont need to instantiate a new media item so:

    args.Fields["umbracoExtention"]="pdf"
    

    Something along those lines.

    Other than that looks good. Good to see you got it working. Im adding this as notes to my examine course ftw!

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 21, 2017 @ 08:30
    Ismail Mayat
    0

    Mitch,

    One more thing you will need to handle content un publish and delete events. So if page a contains pdf b and page a is then unpublished you will need to tap into that event and remove pdf b from index.

    Regards

    Ismail

  • Mitch 44 posts 159 karma points
    Jul 25, 2017 @ 08:30
    Mitch
    0

    Thanks Ismail. I'll make those improvements. Glad you can find some use for it too!

  • Mitch 44 posts 159 karma points
    Jul 25, 2017 @ 13:32
    Mitch
    0

    For the sake of completion, here is my code to remove a PDF from an index if the page it is on becomes unpublished...

    private void ContentService_UnPublished(IPublishingStrategy sender, PublishEventArgs<IContent> e)
        {
            var pdfIndexer = ExamineManager.Instance.IndexProviderCollection["PDFIndexer"];
    
            foreach (var item in e.PublishedEntities)
            {
                // Get all relations where the current node is the parent
                var relations = _relationService.GetByParentId(item.Id).ToList();
    
                foreach (var relation in relations)
                {
                    var mediaNode = _helper.TypedMedia(relation.ChildId);
                    if (mediaNode == null) continue;
    
                    if (mediaNode["umbracoExtension"].ToString() != "pdf") continue;
    
                    // Get all relations for this PDF not including the current relation
                    var otherRelations = _relationService.GetByChildId(mediaNode.Id).Where(rel => rel.ParentId != relation.ParentId);
    
                    // If this PDF is used on other published pages, do nothing and leave PDF in index
                    if (otherRelations.Any(i => _helper.TypedContent(i.ParentId) != null)) continue;
    
                    pdfIndexer.DeleteFromIndex(mediaNode.Id.ToString());
                }
            }
        }
    

    Any suggestions for improvements to this code are most welcome.

Please Sign in or register to post replies

Write your reply to:

Draft