how to

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Steven Moschidis 15 posts 45 karma points

Nov 13, 2012 @ 22:26

0

How-to

Hi Ismail,

I was wondering if there is some kind of guide on how to use this indexer? e.g. what file types it understands, is there anything to be done in umbraco itself, what gets indexed exactly, etc?

Also, I had quite a few errors come up for missing dlls. I went on IKVM's page to get them, but it might be worth including them in the package for others.

Cheers,

Steven

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Nov 14, 2012 @ 12:01

0

Steven,

There is link on project home page as to which formats are supported see http://tika.apache.org/1.2/formats.html#Audio_formats in the ExamineSettings.config you can pass in csv list of files you want to index

<add name="MediaIndexer"

type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"

extensions=".pdf,.docx"

umbracoFileProperty="umbracoFile"/>

set the extensions attribute.

With regards to missing dlls again its on the home page I could not upload all dlls due to 10mb file restriction on our.umbraco. Which dlls were missing? I can then update the installation instructions and include links to those.

Regards

Ismail

Copy Link
Steven Moschidis 15 posts 45 karma points

Nov 19, 2012 @ 16:20

0

Hi Ismail,

Thanks for the reply and apologies for taking so long to reply!

I can't remember which ones were missing, but the ones I have in my bin folder are the following (assume they are all prefixed by IKVM.OpenJDK except for Runtime, which is just prefixed by IKVM):

Beans, Charsets, Corba, Core, Jdbc, Management, Media, Naming, Remoting, Security, SwingAWT, Text, Util, XML.API, XML.Parse, XML.Transform, XML.XPath and Runtime.

As for getting the index working, I've done all of that. My question was more about what file types can be indexed beyond docx and pdf. Also what and how is indexed from the file content?

Cheers,

Steven

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Nov 19, 2012 @ 16:30

0

Steven,

With regards to file types whatever the file type extensions are on apache tika page. So .csv,.ppt,.xsl,.xlsx etc etc. For non music and video and image files the file contents and any meta data attributes are indexed. So for a pdf as well as the contents any associated meta data like title,author date created will also end up in the index in separate fields. With music,video and images only the meta data ends up in the index so for an image any exif meta data embedded in the image will end up in the index.

Regards

Ismail

Copy Link
Steven Moschidis 15 posts 45 karma points

Nov 19, 2012 @ 16:33

0

Hi Ismail,

Thanks for your response. Sorry, I hadn't realised it's just a port of Tika. I'll have a look at the project page for more info.

Thanks for the package and all the info!

Cheers,

Steven

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

How-to