Search through Nested Content Collection that contains Rich Text Editors for a string
Hello,
I have a rather weird search functionality to implement and I was thinking of Examine.
I have nested content items that have text in them. I'd like the search functionality of the website I'm making to be crawling through all of them searching for the query string.
I thought about doing it by field e.g.
if (myField.Contains(query))
But that's impossible as there are a bunch of fields in the module. Is there a way to go through the IEnumerable Collection of Nested Content and search for a string?
I'm not sure how to approach this. Anyone that has done something similar?
No, the Examine Umbraco indexers create a Lucene document for each Umbraco Content, Media or Member being indexed, so Nested Content being a property on these will be indexed into the same Lucene document.
Separating Nested Content items out into their own Lucene documents, allows us to search on them more easily.
Ah I see. I didn't know that. But once you have them as Lucene documents, do they have any link to the original page? Ideally I'd like to have the Url of the page they are part of.
I have quite a few rte properties with different aliases. I can do that in the .config where the indexer is by adding rte field aliases instead of in code, right?
You can combine all the RTEs into the single text field (assuming you just want to find the nested content item via some text without needing to know which RTE it came from)
Sorry, I forgot to mention you'll need to use a Look indexer (which is also an Examine indexer) rather than an Examine Umbraco indexer, as it's the Look indexer that will handle detached content creating the Lucene documents for them. eg.
It's just a searchbar, so the query is not exactly a specific one. User searches for an article on a page and then I would like to search the whole article (which is comprised of Nested Content modules that contain RTE editors with different aliases) and see if the string that they typed in is in there.
I'm trying out your code but for some reason whatever I type comes back as empty. I've added my nested content collection aliases in the NodeQuery etc.
(don't forget to index, by saving the node again, or triggering a full index rebuild)
As you want to find pages from text, then it's probably a good idea to combine/munge all text related to a page into a single field - you could avoid using Look, and do this manually by hooking into the GatheringNodeData event, and pull all relevant text out the nested content items - but Look would work quite well here.
BTW, the nested content Alias will be the document type used as the source for the nested content (rather than a property alias)
Without using any of your code, if I go into Examine management and trigger an index rebuild on the look index that you gave me, it throws a stack overflow exception.
I think Look is exactly what's needed here. I'm not sure what's going wrong just yet and why the enumeration comes back as empty but I'm looking into it.
I've added the Nested Content Aliases as the document type used and I can see that in the backoffice LookIndexer (I've named it that) has 33 documents in Index. Every time I try to trigger an index rebuild the application shuts down without telling me the error. It just says 'The application is in break mode'.
Search through Nested Content Collection that contains Rich Text Editors for a string
Hello,
I have a rather weird search functionality to implement and I was thinking of Examine.
I have nested content items that have text in them. I'd like the search functionality of the website I'm making to be crawling through all of them searching for the query string.
I thought about doing it by field e.g.
But that's impossible as there are a bunch of fields in the module. Is there a way to go through the IEnumerable Collection of Nested Content and search for a string?
I'm not sure how to approach this. Anyone that has done something similar?
Thanks
Hi Harry,
The package Look might be able to help as it'll index Nested Content items as full Lucene Documents, and you can set a custom Text indexer for these.
Oh thanks a lot Hendy! When you say full Lucene Documents, do you mean a page (as in without modules?) I haven't heard the term before.
No, the Examine Umbraco indexers create a Lucene document for each Umbraco Content, Media or Member being indexed, so Nested Content being a property on these will be indexed into the same Lucene document.
Separating Nested Content items out into their own Lucene documents, allows us to search on them more easily.
Ah I see. I didn't know that. But once you have them as Lucene documents, do they have any link to the original page? Ideally I'd like to have the Url of the page they are part of.
eg. to find all nested content items with some text:
to get the 'host' urls for each of the found detached content items (nested content):
You're a god, that's exactly what I need. I appreciate the help a lot! Cheers.
you'll need to configure an indexer, so that the text you want indexed for each nested content item is indexed. eg.
I have quite a few rte properties with different aliases. I can do that in the .config where the indexer is by adding rte field aliases instead of in code, right?
You can combine all the RTEs into the single text field (assuming you just want to find the nested content item via some text without needing to know which RTE it came from)
Sorry, I forgot to mention you'll need to use a Look indexer (which is also an Examine indexer) rather than an Examine Umbraco indexer, as it's the Look indexer that will handle detached content creating the Lucene documents for them. eg.
add these into ExamineSettings.config:
and this into ExaminIndex.config:
What about if I do it like this and then let it search its index? Or do I specifically have to set it with my C# code?
Unfortunately that feature isn't ready yet :(
Ah I see. You talked about combining all RTEs into a single text field. What do you mean by that?
Hi Harry, can you describe the query you'd like to make ?
(reason to combine all RTEs into a single field, is to be able to search for text in any of the RTEs by only searching one field)
It's just a searchbar, so the query is not exactly a specific one. User searches for an article on a page and then I would like to search the whole article (which is comprised of Nested Content modules that contain RTE editors with different aliases) and see if the string that they typed in is in there.
I'm trying out your code but for some reason whatever I type comes back as empty. I've added my nested content collection aliases in the NodeQuery etc.
(don't forget to index, by saving the node again, or triggering a full index rebuild)
As you want to find pages from text, then it's probably a good idea to combine/munge all text related to a page into a single field - you could avoid using Look, and do this manually by hooking into the GatheringNodeData event, and pull all relevant text out the nested content items - but Look would work quite well here.
BTW, the nested content Alias will be the document type used as the source for the nested content (rather than a property alias)
Well,
Without using any of your code, if I go into Examine management and trigger an index rebuild on the look index that you gave me, it throws a stack overflow exception.
I'm not sure how to proceed in this case.
Doh ! any change you could raise an issue here ?
Sure but can't today, will do as soon as I'm free.
Hi Harry, thanks for raising an issue - just posting here as a reference to it.
I think Look is exactly what's needed here. I'm not sure what's going wrong just yet and why the enumeration comes back as empty but I'm looking into it.
I've added the Nested Content Aliases as the document type used and I can see that in the backoffice LookIndexer (I've named it that) has 33 documents in Index. Every time I try to trigger an index rebuild the application shuts down without telling me the error. It just says 'The application is in break mode'.
That's odd, I'll look more into it.
Edit: It's a Stack Overflow Exception
is working on a reply...