Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Niclas Schumacher 67 posts 87 karma points
    Jul 11, 2013 @ 15:35
    Niclas Schumacher
    0

    examine using IndexingNodeDataEventArgs

    Hey guys!
    The last days ive been figuring out how i should use examine and lucene. And now im on to do some programming! 

    But i have a issue.

    i need to add a field to the index, this i do by adding the name onto e.Fields.Add("Field", value);

    But my problem is that i need to get some information for my IndexingNodeDataEventArgs e parameter, and it seems like e contains every document each time, so can't figure out how i should be able to get some data from a field on the document to get the information i want to put into the field ?

    how do i differensiate to get the data i need when e contains the same each time a item gets indexed ? shouldn't i be specefic to the document so i could get information for e and set to be indexed as a field ?

    Hope you guys know what i want to do, otherwise please ask :)

    Thanks in advance.

     

    - Niclas Schumacher

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 11, 2013 @ 15:58
    Ismail Mayat
    0

    Niclas,

    You have  e.Node where Node is XElement you can get data from there or you can do e.NodeId and then new up a document 

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Jul 12, 2013 @ 10:17
    Niclas Schumacher
    0

    Thanks Ismail, that did the job.. 

    I got another problem too, sadly.

    The information i needed to fetch to be indexed is containing HTML, so atm the html is searchable, which isn't that great.

    So my question to you is, can i with Lucene/Examine, go around the html and only get the data for indexing? or is there somehow a workaround. The data is generated from a extern service which we dont have any control over, sadly.

    my code:

     var document = new Document(e.NodeId);

                var property = document.getProperty("borger");

                if(property != null)

                {

                    if(property.Value != "")

                    {

                        XElement xmlProperty = XElement.Parse(property.Value.ToString());

     

                        var currentNode = _repository.GetArticleById(Convert.ToInt32(xmlProperty.Attribute("ArticleId").Value));

     

                        var content = _repository.SplitArticleContent(currentNode.Content);

                        e.Fields.Add("Borger", content.MainContent);

                    }

                } 

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 12, 2013 @ 10:22
    Ismail Mayat
    0

    Niclas,

    The code above is for another issue which is now fixed? However you now have another issue where you are indexing custom data (not umbraco data?) or it is umbraco data but the content is entered into an umbraco document from a third party source? I just need context of the problem then hopefully should be able to fix.

    During indexing examine should strip out html.

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 12, 2013 @ 10:25
    Ismail Mayat
    0

    Niclas,

    Ok I just re read your code and now understand the problem. Can you take a look at your ExamineSettings.config file and for the indexer what is the analyzer set to? I still think it should strip the html. However what you could try is 

     e.Fields.Add("Borger", umbraco.library.StripHtml( content.MainContent));

    that should also strip it out just giving you content without the html.

    Regards

    Ismial

  • Niclas Schumacher 67 posts 87 karma points
    Jul 12, 2013 @ 10:26
    Niclas Schumacher
    0

    Hallo Ismail, 

    The code is working atm. But the Borger field is getting alle the html with it,  so a when i search for div, then i get that document, based in there is a div in the content for instance :)- which i would like to not happen. 
    the id is stored in Umbraco, then i fetch the id, and use my repository to get the content from the external service. So i got no control of how the html is looking.

     

    The thing i want to achieve is to get rid of the HTML in the indexing moment, if that is posible, or atleast that the attributes and tags from html isn't searchable.

     

    Thanks for the help !  

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 12, 2013 @ 10:29
    Ismail Mayat
    0

    Niclas,

    Try umbraco.library.StringHtml that should get rid its part of the umbraco core if that dont work then you could load the html into an HtmlAgility pack doc and use that to strip the html. I think in later umbraco versions you get htmlagility pack if not you can download the library from http://htmlagilitypack.codeplex.com/

  • Niclas Schumacher 67 posts 87 karma points
    Jul 12, 2013 @ 10:41
    Niclas Schumacher
    0

    Ill check it out if we can't get examine to do it.

    Im just using the standard analyzer with my indexer,

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 12, 2013 @ 10:43
    Ismail Mayat
    0

    Niclas,

    Right when umbraco content is being indexed then within umbraco or examine it strips out html i suspect its umbraco doing it. In your case its external data so umbraco has no knowledge of it therefore html is still there so try umbraco.library.stringhtml failing that use agilitypack.

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Jul 12, 2013 @ 10:46
    Niclas Schumacher
    0

    Okay, ill give it a try.

    And btw, a great thanks for being so active in blogging and on the forums about this topic, you keep showing up in every serach i make, more or less.. 

    One last question: ive seen alot of places that the lucene in action book should be the best place to learn about lucene, can you confern this still ? Or are there better reading elsewhere now when im using lucene.net and examine.

    Once agian, thanks for the help! 

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 12, 2013 @ 10:55
    Ismail Mayat
    1

    Niclas,

    Highly recommend lucene in action 2nd edition just to get deeper understanding of Lucene.  There is my session from uk festival http://umbracoukfestival.co.uk/videos-photos/ video and slides also i did 12/13 posts going into a bit more detail on different indexing topics with examine see http://thecogworks.co.uk/blog/posts/2012/november/examiness-hints-and-tips-from-the-trenches-part-1/ 

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Jul 12, 2013 @ 10:56
    Niclas Schumacher
    0

    Okay, that was also my thought. 
    I've been through all of those, great stuff! :) 

Please Sign in or register to post replies

Write your reply to:

Draft