Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 13, 2010 @ 14:03
    Ismail Mayat
    2

    Examine pdf index item inject into content index

    Shannon,

    I am trying as per your suggestion in pdf indexing topic:

    "You could do it a bit 'dodgy' and just listen to the indexed event of the PDF indexer and add the results to your Content Indexer using ReIndexNode

    This means that you'll have PDF data in two indexes... but it would be very little code to write."

    This is what i have so far:

     ExamineManager.Instance.IndexProviderCollection[PdfIndex].NodeIndexed += ExamineEvents_NodeIndexed;

    and the delegate is:

    void ExamineEvents_NodeIndexed(object sender, IndexedNodeEventArgs e)
            {
    
            }

    what do i do at this point to get the pdf content and add to the content index? The IndexedNodeEventArgs does not have fields dictionary for me to get the pdf content also the ReIndexNode method which will be called on the content index takes an linq xml document where do i get that from?

    do i need to cast sender to something?

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 13, 2010 @ 15:33
    Ismail Mayat
    1

    In answer to my own question here is how I hacked this:

    Updated the event to handle from NodeIndexed to NodeIndexing, in that method I have

    string pdfContent = string.Empty;
    
                e.Fields.TryGetValue(PdfIndexContentFieldAlias, out pdfContent);
    
                if(pdfContent!=string.Empty)
                {
                     XElement mediaXml = GetMediaItem(e.NodeId);
                     mediaXml.Add(new XElement("contents", pdfContent));
                     ExamineManager.Instance.IndexProviderCollection[WoiIndexer].ReIndexNode(mediaXml, IndexTypes.Media);
                }

    and that injects in the extracted pdf content nicely.  I also have method for IndexDeleted and so when media item is deleted it will remove it from the content index as well.  The one downside to this is if I rebuild the content index i need to ensure that i also rebuild the pdf index to synchronise the 2.  

    Regards

     

    Ismail

  • Andrew Waegel 126 posts 126 karma points
    Oct 13, 2010 @ 22:40
    Andrew Waegel
    0

    Ismail, what file does that code live in? Is it a class in an external project that's build then copied over to the umbraco installation?

    EDIT: I think I answered my own question - looking at Shannon's demo code from CG 2010 he has a separate class file, ExamineEvents.cs, that looks for the various Examine events and acts on them.

     

  • Andrew Waegel 126 posts 126 karma points
    Oct 14, 2010 @ 02:45
    Andrew Waegel
    0

    So WoiIndexer is your content index, and calling ReIndexNode on that with the xml you grabbed and modified from the PDF index created a new 'document' in that index with the PDF content?

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 14, 2010 @ 10:18
    Ismail Mayat
    0

    Andrew,

    The code lives in own class/project and is you have rightly deduced goes into the umbraco bin.  Makes use of examine events on the different indexes.  In this example woiindexer is my content index and Pdf index is pdf content and i inject in the pdf content.  

    There are 2 things to note with this the first is if you rebuild the content index you have to then rebuild the pdf index or else the pdf content will be missing.  The second is you have 2 lots of data however so long as you dont have shed loads of pdfs its not really that big an issue.  There are potentially 3 other ways round this problem:

    1. Create your own indexer, probably will need to your own config so that you can tell it which indexes to mix however you have duplicate data issue

    2. Create your own searcher, quite a bit of coding 

    3. Do 2 searches one on content the other on pdf and concat however ranking will not work by score each index result set will be ranked accordingly not as a collective.

    This way though not ideal has been fairly straight forward to implement, if need more info just skype me on ismail_mayat

    Regards

    Ismail

  • Andrew Waegel 126 posts 126 karma points
    Oct 14, 2010 @ 22:48
    Andrew Waegel
    0

    The only think I can't work out is

    GetMediaItem(e.NodeId);

    Where/what is GetMediaItem? I found a version in the UmbracoHelper library, but it doesn't seem to return the right thing.

    Regards,
    - Andrew

  • Andrew Waegel 126 posts 126 karma points
    Oct 14, 2010 @ 23:03
    Andrew Waegel
    0

    Actually if you could see your way clear to posting the class file you're using here it would really help out .NET hacks like me who are a little unsure about topics like events and delegation. I did find a tutorial that got me to about 90% comprehension of the concept but working examples are most constructive.

    Cheers,
    - Andrew

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 18, 2010 @ 10:06
    Ismail Mayat
    0

    Andrew,

    I gave cut down code as there is other stuff in those classes specific to project i am working on that would just confuse matters with regards to the getmedia code its

            private XElement GetMediaItem(int nodeId)
            {
                var nodes = umbraco.library.GetMedia(nodeId, false);
                return XElement.Parse(nodes.Current.OuterXml);
            }

    its pretty simple i get the linqtoxml node and then inject the pdf element into it and then pass to indexer which does all the rest.

    Regards

    Ismail

  • Andrew Waegel 126 posts 126 karma points
    Oct 18, 2010 @ 21:07
    Andrew Waegel
    2

    Thanks once again Ismail, I got it working. For any other .NET hacks/beginners out there, here's the class file that works for me.

    MyPublicIndexer = the default index
    MyPublicPDFIndexer = the supplemental PDF indexer, defined in ExamineSettings.config & ExamineIndex.config

    Note that I'm adding the PDF file contents to the main index as a 'content' index type so it will show up with my main lucene search.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Web;
    using System.Xml;
    using System.Xml.Linq;
    using umbraco.BusinessLogic;
    using Examine;
    using UmbracoExamine;
    using umbraco.presentation.nodeFactory;
    using System.Text;
    using umbraco.cms.businesslogic;
    using umbraco.cms.businesslogic.web;
    
    namespace My_Controls
    {  
        public class ExamineEvents : ApplicationBase
        {
            public ExamineEvents()
            {
                // Add event handler for 'NodeIndexing' on 'MyPublicPDFIndexer'  
                // Handler calls the class method 'ExamineEvents_NodeIndexing' below
                ExamineManager.Instance.IndexProviderCollection["MyPublicPDFIndexer"].NodeIndexing += new EventHandler(ExamineEvents_NodeIndexing);
                // simple example of how to write to the debug log
    Log.Add(LogTypes.Debug, 0, "in ExamineEvents Constructor"); }
       
            // helper method to fetch node as XElement  
    private XElement GetMediaItem(int nodeId) { var nodes = umbraco.library.GetMedia(nodeId, false); return XElement.Parse(nodes.Current.OuterXml); } /// /// Event handler fired when Examine is indexing a node ///  void ExamineEvents_NodeIndexing(object sender, IndexingNodeEventArgs e) { // "I am here" logging. Don't laugh; it helped me :)
                Log.Add(LogTypes.Debug, 0, "in ExamineEvents_NodeIndexing");

                // try to get the indexed PDF content; FileTextContent is where the UmbracoExamine.PDF indexer puts it by default
                string pdfContent = string.Empty;
                e.Fields.TryGetValue("FileTextContent", out pdfContent);          

                // If we found some content, add it to the main content index in a field called "contents"
    if (pdfContent != string.Empty) { XElement mediaXml = GetMediaItem(e.NodeId); mediaXml.Add(new XElement("contents", pdfContent)); ExamineManager.Instance.IndexProviderCollection["MyPublicIndexer"].ReIndexNode(mediaXml, IndexTypes.Content); } } } }
  • Allan James 20 posts 40 karma points
    Jan 22, 2012 @ 18:21
    Allan James
    0
  • David Conlisk 432 posts 1008 karma points
    Jun 22, 2013 @ 14:20
    David Conlisk
    0

    Thanks Ismail and Andrew, your posts got me there in the end! :) #h5yr

  • Dan Evans 629 posts 1016 karma points
    Jan 28, 2014 @ 12:47
    Dan Evans
    0

    I'm trying to get this working in Umbraco 7. Any ideas why it wouldn't?

     

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 28, 2014 @ 15:42
    Ismail Mayat
    0

    Dan,

    What errors are you getting? Anything in the logfile? Also i thought pdf indexer was removed at some point?

    Regards

    Ismail

  • Dan Evans 629 posts 1016 karma points
    Jan 28, 2014 @ 17:32
    Dan Evans
    0

    No errors in log table or log file. The event is not firing. I updated the code to use the new 6.1.0+ method - http://our.umbraco.org/documentation/Reference/Events/application-startup

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Web;

    using System.Xml;

    using System.Xml.Linq;

    using umbraco.BusinessLogic;

    using Examine;

    using UmbracoExamine;

    using umbraco.presentation.nodeFactory;

    using System.Text;

    using umbraco.cms.businesslogic;

    using umbraco.cms.businesslogic.web;

    using Umbraco.Core;

     

     

    namespace Umbraco.Extensions.EventHandlers

    {

        publicclassRegisterEvents : ApplicationEventHandler

        {

     

            protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)

            {

                // Add event handler for 'NodeIndexing' on 'MyPublicPDFIndexer'  

                // Handler calls the class method 'ExamineEvents_NodeIndexing' below

                // ExamineManager.Instance.IndexProviderCollection["PDFIndexer"].NodeIndexing

                //    += new EventHandler(ExamineEvents_NodeIndexing);

     

                ExamineManager.Instance.IndexProviderCollection["PDFIndexer"].NodeIndexing += ExamineEvents_NodeIndexing;

       

                // simple example of how to write to the debug log

                //Log.Add(LogTypes.Debug, 0, "in ExamineEvents Constructor");

     

                Umbraco.Core.Logging.LogHelper.Debug(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType, "in ExamineEvents Constructor");

            }

           

     

            // helper method to fetch node as XElement   

            private XElement GetMediaItem(int nodeId)

            {

                var nodes = umbraco.library.GetMedia(nodeId, false);

                return XElement.Parse(nodes.Current.OuterXml);

            }

     

            /// 

            /// Event handler fired when Examine is indexing a node

            ///  

             void ExamineEvents_NodeIndexing(object sender, IndexingNodeEventArgs e)

            {

                // "I am here" logging. Don't laugh; it helped me :)

                //Log.Add(LogTypes.Debug, 0, "in ExamineEvents_NodeIndexing");

                Umbraco.Core.Logging.LogHelper.Debug(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType, "in ExamineEvents_NodeIndexing");

                // try to get the indexed PDF content; FileTextContent is where the UmbracoExamine.PDF indexer puts it by default 

                string pdfContent = string.Empty;

                e.Fields.TryGetValue("FileTextContent", out pdfContent);

     

                // If we found some content, add it to the main content index in a field called "contents"

                if (pdfContent != string.Empty)

                {

                    XElement mediaXml = GetMediaItem(e.NodeId);

                    mediaXml.Add(new XElement("contents", pdfContent));

                    ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"].ReIndexNode(mediaXml, IndexTypes.Content);

                }

            }

        }

    }

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 28, 2014 @ 17:49
    Ismail Mayat
    0

    Does the event fire for your external index? Just want to narrow it down to whether only the pdf index is having the issue?

    Regards

    Ismail

  • Dan Evans 629 posts 1016 karma points
    Jan 28, 2014 @ 18:02
    Dan Evans
    0

     

    ExamineEvents_NodeIndexing does not get fired at all.

    I meant to say that both the external index and pdf index have content and are indexing seperately as I've checked via Examine Management.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 29, 2014 @ 10:05
    Ismail Mayat
    0

    Dan,

    I dont think its examine thats the issue its applicationeventhandler so just to confirm that can you wire up a document publish event and put in it some logging code see if that fires. I have seen this before in v6 and I changed from applicationeventhandler to IApplicaitoneventhandler even though that is older.

    Regards

    Ismail

  • Dan Evans 629 posts 1016 karma points
    Jan 29, 2014 @ 10:27
    Dan Evans
    0

    HI Ismail

    I tried that but assumed as there was an error Umbraco 7 didn't support it:

    
    Compiler Error Message: CS0535: 'Umbraco.Extensions.EventHandlers.RegisterEvents' does not implement interface member 'Umbraco.Core.IApplicationEventHandler.OnApplicationInitialized(Umbraco.Core.UmbracoApplicationBase, Umbraco.Core.ApplicationContext)'

    Source Error:

     
    Line 18: namespace Umbraco.Extensions.EventHandlers
    Line 19: {
    Line 20: public class RegisterEvents : IApplicationEventHandler Line 21:     {
    Line 22: 
  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 29, 2014 @ 10:46
    Ismail Mayat
    0

    Did you try document publish event?

    Regards

    Ismail

  • Dan Evans 629 posts 1016 karma points
    Jan 29, 2014 @ 10:47
    Dan Evans
    0

    I did.

    Dan

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 29, 2014 @ 10:48
    Ismail Mayat
    0

    And that fired?

  • Dan Evans 629 posts 1016 karma points
    Jan 29, 2014 @ 10:50
    Dan Evans
    0

    Sorry, no it didn't. I get the same error (above). It's not even getting that far. There's a compilation error.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 29, 2014 @ 10:51
    Ismail Mayat
    0

    Using ApplicationEventHandler not the interface? If you can use the original handler and see if it fires if not i would raise on issues as there is something else wrong with wiring up events.

  • Dan Evans 629 posts 1016 karma points
    Jan 29, 2014 @ 11:35
    Dan Evans
    0

    OK, thanks. No it's not firing using ApplicationEventHandler

    Thanks for your help.

    Dan

Please Sign in or register to post replies

Write your reply to:

Draft