I've gone down the Skybrud route of creating custom converters to index content in an examine index for the purposes of creating a combined search index with normal + weighted content i.e. titles etc.
i.e.: value = GridControlInfoBlockCollectionValue.Parse(control, token as JObject);
Where GridControlInfoBlockCollectionValue indexes the Heading and then a json collection of items each with their own titles and content etc:
[JsonProperty("heading")]
public string Heading { get; protected set; }
[JsonProperty("items")]
public IEnumerable<GridControlInfoBlockValue> Items { get; protected set; }
I was wondering if anyone had done anything in this area parsing the plugin html templates or the like to index custom grid plugins in a more generic way?
I did some dirty screen scraping using Htmlagility pack on a site where we had loads of grid content. Also the grid elements had macros that rendered content so went for quick dirty way of indexing.
This involved using GatheringNode data event and for the current page making a request and loading page content into html agility pack then using xpath getting the grid content and pushing the content of that node minus html into the index.
One advantage of this is if grid changes in future I do not have to worry about it.
The site i was working on was multilingual so there was a bit of messing around getting the correct url to scrape. Overall all worked fine. If you have a lot of content and you do full rebuild it may slow things down a bit however on the site I have we have not had to do a full rebuild you just publish a page and gathering node kicks in.
Thanks Ismail,
I implemented something very similar and had to remove conditional comments etc to get some things cleaned up.
The main issue I've had since implementing in 7.6.5 is the indexreader closed exception if the scrape takes too long..
I got around it by getting the scraper to send a header and then checking for that header and removing excess calls to tracking tags and scripts etc but that error still crops up intermittently
And is your ExamineHelper in that example a singleton or static class of some type?
Examine Add Custom Grid Plugins to Examine Index
I've gone down the Skybrud route of creating custom converters to index content in an examine index for the purposes of creating a combined search index with normal + weighted content i.e. titles etc.
i.e.: value = GridControlInfoBlockCollectionValue.Parse(control, token as JObject);
Where GridControlInfoBlockCollectionValue indexes the Heading and then a json collection of items each with their own titles and content etc:
I was wondering if anyone had done anything in this area parsing the plugin html templates or the like to index custom grid plugins in a more generic way?
Cheers. Tom
Tom,
I did some dirty screen scraping using Htmlagility pack on a site where we had loads of grid content. Also the grid elements had macros that rendered content so went for quick dirty way of indexing.
This involved using GatheringNode data event and for the current page making a request and loading page content into html agility pack then using xpath getting the grid content and pushing the content of that node minus html into the index.
One advantage of this is if grid changes in future I do not have to worry about it.
Regards
Ismail
Hi Ismail, Sounds like a decent way to go in our case too given macros etc..
Were there any gotchas I should be aware of from your time in the trenches?
Did you end up using umbracoHelper.RenderTemplate? or did you stick with the scraping approach to deal with the macros etc?
Also with the scraping did you experience Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed exceptions?
Is there a way to lock and release or something similar because of the longer running operation of scraping?
Thanks for the reply :)
Tom,
The site i was working on was multilingual so there was a bit of messing around getting the correct url to scrape. Overall all worked fine. If you have a lot of content and you do full rebuild it may slow things down a bit however on the site I have we have not had to do a full rebuild you just publish a page and gathering node kicks in.
I have it as an exercise on the examine course under indexing complex data types. I also posted code here https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/76413-examine-indexing-rich-and-complex-property-editors
Regards
Ismail
Thanks Ismail, I implemented something very similar and had to remove conditional comments etc to get some things cleaned up.
The main issue I've had since implementing in 7.6.5 is the indexreader closed exception if the scrape takes too long..
I got around it by getting the scraper to send a header and then checking for that header and removing excess calls to tracking tags and scripts etc but that error still crops up intermittently
And is your ExamineHelper in that example a singleton or static class of some type?
Thanks again for your time
is working on a reply...