Custom Examine indexer to index any umbraco media nodes. Under the hood it makes use of Apache Tika to extract content and meta data from umbraco media files. Tika can handle the following formats. The package also supports VPP (Virtual path provider) so if your media files are in azure etc it will also index those.
This package is supported on Umbraco 7.6.1+.
ExamineFileIndexer is available from Our Umbraco, NuGet, or as a manual download directly from GitHub.
PLEASE NOTE THAT THE UMBRACO PACKAGE DOES NOT CONTAIN THE TIKA DLLS, THIS IS DUE TO 10MB FILE SIZE RESTRICTION ON OUR. YOU NEED TO INSTALL THE PACKAGE THEN COPY OVER TIKA DLLS FOUND ON DROP BOX (https://www.dropbox.com/s/1d9wkom2gbaax2h/tika.zip?dl=0).
DOWNLOAD THE ZIP AND EXTRACT THE DLLS TO THE BIN FOLDER OF YOUR SITE AFTER INSTALLING THE UMBRACO PACKAGE.
After installation your ExamineIndex.config and ExamineSettings.config file will updated. The following entries will be added.
<IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/MediaIndexSet">
<IndexAttributeFields>
<add Name="id" />
<add Name="nodeName" />
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" />
<add Name="parentID" />
</IndexAttributeFields>
<IncludeNodeTypes>
<add Name="File" />
</IncludeNodeTypes>
</IndexSet>
Under ExamineIndexProviders/providers:
<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer"
extensions=".pdf,.docx"
umbracoFileProperty="umbracoFile" />
Under ExamineSearchProviders/providers:
<add name="MediaSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MediaIndexSet"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />
By default the following file types will be indexed: pdf, docx. To add other file types to index you need to update ExamineSettings.config:
<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer"
extensions=".pdf,.docx"
umbracoFileProperty="umbracoFile" />
Update the extensions attribute and add any other file types. They need to be separated by colons (,).
You can also add the image file types eg. .jpg. PLEASE NOTE INDEXING IMAGES WILL ONLY ADD EXIF META DATA.