CogUmbracoExamineMediaIndexer
=============================
The CogUmbracoExamineMediaIndexer is a custom indexer to index any umbraco MEDIA node. Under the hood it makes use of apache tika (http://tika.apache.org/) to extract content and meta data from umbraco media files. Tika can handle the following formats http://tika.apache.org/1.2/formats.html#Audio_formats
Installation instructions
=========================
Install the package. It should copy over dlls to the bin and update your examine config files.
AFTER INSTALLING YOU MUST DOWNLOAD THE TIKA DLL FROM https://www.dropbox.com/s/0rk556kjgd8swvy/tika-app-1.2.dll AND COPY IT OVER TO THE BIN. I COULD NOT INCLUDE IT AS PART OF PACKAGE DUE TO FILE SIZE RESTRICTION ON OUR.UMBRACO.ORG/PROJECTS
Manual installation
===================
Copy over all dlls to the bin directory. Copy the entries in the ExamineIndex and ExamineSettings config files over to your examine config files. DO NOT STRAIGHT COPY OVER THE SAMPLE CONFIG FILES AS YOU MAY HAVE EXISTING INDEX CONFIG.
The sample only indexes files if you want to index images (will only extract exif meta data) then add the following to your ExamineIndex.config under <IncludeNodeTypes>
<add Name="File" />
Additionally you will need to update the ExamineSettings.config for your media indexer add to the extensions attribute
extensions=".pdf,.docx"
add the kinds of images you want to index eg .jpg. PLEASE NOTE INDEXING IMAGES WILL ONLY ADD EXIF META DATA.
Notes:
======
As mentioned this library makes use of apache tika. This is a java library. To use in .net we are utilising http://www.ikvm.net/devguide/net2java.html
Credits
=======
The following article got me up and running with this idea
http://www.dovetailsoftware.com/blogs/kmiller/archive/2010/07/02/using-the-tika-java-library-in-your-net-application-with-ikvm