Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Steven Moschidis 15 posts 45 karma points
    Nov 13, 2012 @ 22:26
    Steven Moschidis
    0

    How-to

    Hi Ismail,

    I was wondering if there is some kind of guide on how to use this indexer? e.g. what file types it understands, is there anything to be done in umbraco itself, what gets indexed exactly, etc?

    Also, I had quite a few errors come up for missing dlls. I went on IKVM's page to get them, but it might be worth including them in the package for others.

    Cheers,

    Steven

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Nov 14, 2012 @ 12:01
    Ismail Mayat
    0

    Steven,

    There is link on project home page as to which formats are supported see http://tika.apache.org/1.2/formats.html#Audio_formats in the ExamineSettings.config you can pass in csv list of files you want to index 

    <add name="MediaIndexer" 

                 type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"

                 extensions=".pdf,.docx"

                 umbracoFileProperty="umbracoFile"/>

    set the extensions attribute.

    With regards to missing dlls again its on the home page I could not upload all dlls due to 10mb file restriction on our.umbraco. Which dlls were missing? I can then update the installation instructions and include links to those.

    Regards

     

    Ismail

  • Steven Moschidis 15 posts 45 karma points
    Nov 19, 2012 @ 16:20
    Steven Moschidis
    0

    Hi Ismail,

    Thanks for the reply and apologies for taking so long to reply!

    I can't remember which ones were missing, but the ones I have in my bin folder are the following (assume they are all prefixed by IKVM.OpenJDK except for Runtime, which is just prefixed by IKVM):

    Beans, Charsets, Corba, Core, Jdbc, Management, Media, Naming, Remoting, Security, SwingAWT, Text, Util, XML.API, XML.Parse, XML.Transform, XML.XPath and Runtime.

    As for getting the index working, I've done all of that. My question was more about what file types can be indexed beyond docx and pdf. Also what and how is indexed from the file content?

    Cheers,

    Steven

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Nov 19, 2012 @ 16:30
    Ismail Mayat
    0

    Steven,

    With regards to file types whatever the file type extensions are on apache tika page. So .csv,.ppt,.xsl,.xlsx etc etc.  For non music and video and image files the file contents and any meta data attributes are indexed. So for a pdf as well as the contents any associated meta data like title,author date created will also end up in the index in separate fields.  With music,video and images only the meta data ends up in the index so for an image any exif meta data embedded in the image will end up in the index.

    Regards

    Ismail

  • Steven Moschidis 15 posts 45 karma points
    Nov 19, 2012 @ 16:33
    Steven Moschidis
    0

    Hi Ismail,

    Thanks for your response. Sorry, I hadn't realised it's just a port of Tika. I'll have a look at the project page for more info.

    Thanks for the package and all the info!

    Cheers,

    Steven

Please Sign in or register to post replies

Write your reply to:

Draft