Index Content node's multi media picker name attribute
Hi,
In Umbraco 7, I have a doc type which has a header field, rich text editor field and a multi media picker.
The author creates pages using this doc type and it displays a web page with the heading, the text and then a list of the documents the author wants to allow the visitor to download.
The website has an Examine search facility and the indexer and searcher can be used to return the pages containing matching terms. It seems to be indexing against the header and text fields but it isn't indexing the contents of the multi media picker. For example, if the file is called AdultBallet.pdf, a search for Adult or AdultBallet is not returning anything.
Ideally I didn't want to index the Media section itself because this would return the files in a search, but no the page the file is sitting on in the website. The client needs the user to be able to search for "Adult" and the search results return the page the file is available from. If that makes sense?
Any help or advice on how to do this, would be much appreciated.
The following indexer and searcher is defined in ExamineSettings.config:
The latter being of particular importance as it talks about getting access to an Umbraco Helper, which means you can pass in ID in and get a Typed Media item back to extract information :-)
You will need to implement gathering node data event, then in that test for the content type and then test if the picker field has been selected then from the id get the media item.
Next you will need to extract the content of the pdf then inject that extracted content into examine field. For the pdf extraction you could install examine pdf indexer and use the api from that to extract the contents of the pdf or you could write something yourself with 3rd party pdf libraries to extract the content.
So its not out of the box but with a bit of code is it doable.
So I should have posted this nearly 12 months ago, when I fixed it at the time, but I have finally got around to it!
Nik and Ismail's replies really helped.
In my case, I wanted to index the name of each of the documents added by the content author to a node type in Umbraco. So Nik's advice was most helpful in my case, but if I ever need to index the content aswell, then I will be able to thanks to Ismail's help.
I'll post the code here in case it's ever useful to anyone else.
Add a class that inherits from IApplicationEventHandler:
public class AppEvents : IApplicationEventHandler
{
public void OnApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
{
var helper = new UmbracoHelper(UmbracoContext.Current);
ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"].GatheringNodeData
+= (sender, e) => ExamineEvents.GatheringContentData(sender, e, helper);
}
}
Now below I will create the static class that was referenced in the ApplicationEventHandler above.
When creating the search index, the code needs to check if the content author has added one or more media items ("ItemDocuments") inside a document type called AccordionTabs. If some documents do exist, then loop through the media items, getting their GUID, retrieving them as TypedMedia to allow me to add the media item's name to the index.
I also had to put a line in there to try to get the media items as UDI documents, or to retrieve those documents that use an integer instead as their id. Some documents seemed to be stored as Umbraco.Core.Udi but some others were not. I'm not sure why, but adding in this check meant no documents were missed from the index.
public static class ExamineEvents
{
public static void GatheringContentData(object sender, IndexingNodeDataEventArgs e, UmbracoHelper helper)
{
if (e.IndexType == "content")
{
var content = helper.TypedContent(e.NodeId);
// Custom indexing logic
try
{
if (content != null && content.ContentType.Alias == "AccordionTab" && content.HasProperty("AccordionTabs") && content.HasValue("AccordionTabs"))
{
IEnumerable<IPublishedContent> tabs = (IEnumerable<IPublishedContent>)content.GetProperty("AccordionTabs").Value;
if (tabs != null)
{
var fieldDocuments = new StringBuilder();
foreach (var t in tabs)
{
if (t != null && t.HasProperty("ItemDocuments") && t.HasValue("ItemDocuments"))
{
var ipublishDocs = t.GetPropertyValue<IEnumerable<IPublishedContent>>("ItemDocuments");
if (ipublishDocs == null)
{
var udiDocs = (IEnumerable<Umbraco.Core.Udi>)t.GetProperty("ItemDocuments").Value;
foreach (Umbraco.Core.GuidUdi d in udiDocs)
{
string id = d.Guid.ToString();
var m = helper.TypedMedia(id);
if (m != null)
{
fieldDocuments.AppendLine(m.Name);
}
}
}
else
{
foreach (var d in ipublishDocs)
{
var m = helper.TypedMedia(d.Id);
if (m != null)
{
fieldDocuments.AppendLine(m.Name);
}
}
}
}
}
e.Fields.Add("websiteDocuments", fieldDocuments.ToString());
}
}
}
catch (Exception err)
{
throw err;
}
}
}
}
Index Content node's multi media picker name attribute
Hi,
In Umbraco 7, I have a doc type which has a header field, rich text editor field and a multi media picker.
The author creates pages using this doc type and it displays a web page with the heading, the text and then a list of the documents the author wants to allow the visitor to download.
The website has an Examine search facility and the indexer and searcher can be used to return the pages containing matching terms. It seems to be indexing against the header and text fields but it isn't indexing the contents of the multi media picker. For example, if the file is called AdultBallet.pdf, a search for Adult or AdultBallet is not returning anything.
Ideally I didn't want to index the Media section itself because this would return the files in a search, but no the page the file is sitting on in the website. The client needs the user to be able to search for "Adult" and the search results return the page the file is available from. If that makes sense?
Any help or advice on how to do this, would be much appreciated.
The following indexer and searcher is defined in ExamineSettings.config:
An IndexSet is defined in ExamineIndex.config:
Hey Ian,
Okay, so what you are seeing is default behaviour for things like Media Pickers, Multi Node Tree Pickers, Content Pickers etc.
In order to get better information into your indexes you'll need to add some start up events and hook into indexing.
These two resources should give you a good base for looking at how to do this :-)
https://shazwazza.com/post/searching-multi-node-tree-picker-data-or-any-collection-with-examine/
https://staheri.com/my-blog/2015/march/custom-examine-indexing-using-umbraco-cache/
The latter being of particular importance as it talks about getting access to an Umbraco Helper, which means you can pass in ID in and get a Typed Media item back to extract information :-)
Hope that helps.
Nik
Ian,
You will need to implement gathering node data event, then in that test for the content type and then test if the picker field has been selected then from the id get the media item.
Next you will need to extract the content of the pdf then inject that extracted content into examine field. For the pdf extraction you could install examine pdf indexer and use the api from that to extract the contents of the pdf or you could write something yourself with 3rd party pdf libraries to extract the content.
So its not out of the box but with a bit of code is it doable.
So I should have posted this nearly 12 months ago, when I fixed it at the time, but I have finally got around to it!
Nik and Ismail's replies really helped.
In my case, I wanted to index the name of each of the documents added by the content author to a node type in Umbraco. So Nik's advice was most helpful in my case, but if I ever need to index the content aswell, then I will be able to thanks to Ismail's help.
I'll post the code here in case it's ever useful to anyone else.
Add a class that inherits from IApplicationEventHandler:
Now below I will create the static class that was referenced in the ApplicationEventHandler above.
When creating the search index, the code needs to check if the content author has added one or more media items ("ItemDocuments") inside a document type called AccordionTabs. If some documents do exist, then loop through the media items, getting their GUID, retrieving them as TypedMedia to allow me to add the media item's name to the index.
I also had to put a line in there to try to get the media items as UDI documents, or to retrieve those documents that use an integer instead as their id. Some documents seemed to be stored as Umbraco.Core.Udi but some others were not. I'm not sure why, but adding in this check meant no documents were missed from the index.
is working on a reply...