multilingual examine index with dynamic number of languages

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib

Nov 04, 2014 @ 15:28

0

Multilingual examine index with dynamic number of languages

Hi everyone.

I'm looking to implement search on a multilingual site, with both european and asian languages.

The number of languages is dynamic, since the editors can create new ones when they want, so the indexing of each language should not rely on configuration files.

Since I cannot use the same analyzer for english and chinese, I need to do something different that the standard indexes.

Looking a Lucene.net articles, I can see that some people recommend adding language specific fields to the index, and specifying different analyzers per field. Others recommend having separate indexes per language.

So my question is:

1: Can I create a custom index with examine, that has different analyzers for different fields?

or

2: Can I dynamically specify the number of indexes I wan, without changing my config files?

or

3: Should I not use Examine for this task, and just go directly to the Lucene.net api's?

Experiences are very welcome

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Nov 04, 2014 @ 15:34

0

Morten,

I think no1 with examine is out, you may be able todo it with lucene.net directly. 2 is doable you would need to tab into umbraco events so when new root language node created you update the config files programatically so will need updating but do it in code, although knowing up front which analyser to use if non standard language may be a challenge.

Copy Link
Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib

Nov 04, 2014 @ 15:46

0

Thanks for the feedback Ismail.

I don't think updating the config files programatically will solve it, since we run multi server as well, and we deploy those files from TFS. I think having a dynamic config file like that would hurt in that respect. I might try and see if I can inject my own "config reader" to fake reading that actual config files.

With regards to selecting the correct analyzer, I think I would make a map for the analyzers we have, and then default to something like the StandardAnalyzer which works to some degree with most languages. Then we could add/map new analyzers if we have a better option later.

Copy Link
Shannon Deminick 1526 posts 5272 karma points MVP 3x

Nov 04, 2014 @ 16:03

1

Document writing event gives you direct access to the lucene doc, you should be able to index however you want in that method. You could also create your own analyzer that wraps the underlying ones you want

Copy Link
Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib

Nov 04, 2014 @ 16:45

0

Thanks Shannon, I will take a look at that.

One thing I just thought of was, that I could create one index per analyzer, And then just dispatch my documents/searches to the appropriate index, based on the language. That way I can define my indexes at design time, using the standard config. Then I just need to add a get to each indexed document with the actual language, so I can restrict my search to that language.

Copy Link
is working on a reply...

Please Sign in or register to post replies

Flag this post as spam?