I have a website that consists of three languages, english, german and chinese.
On the website there is 1 index that handles all search requests.
My issue is that both english and german works fine, but chinese gives no hits. But the funny thing is, if I do the same search in backend, I get hits.
When I do the search in frontend I do like this:
Querystring looks like this: globalsearch?query=[chinese characters]
(for some reason the forum removes the chinese characters)
If the ex:Search() extension is using the same search index etc. as the backend (I have no idea if that is in fact the case) - my first check would be to see if the Chinese nodes are in fact published? (XSLT can only "see" published content).
Next up, I'd try to find out if the extension is written with limited encoding support (unintentionally, probably), but that's a little harder to figure out...
Check the index itself to see if it has any of the Chinese in there. I suspect it doesn't. At least, not the index for the public website (rather than the backoffice... they are different indexes).
Go to the Developer section.
Select the the Examine Management tab.
Expand the Searchers > InternalSearcher item
Search for some Chinese. I bet you find it. This is the index used by the back office.
Expand the Searchers > ExternalSearcher item
Search for the same Chinese. Do you find it, or is it missing? This is the index used by the website for searching.
At the risk of offending our XSLT preferences... you might give ezSearch a try. It is like XSLTsearch only better because it uses Examine (and has limited media searching as well, if you want it).
Easy to extend and debug with VisualStudio if you want to set a breakpoint and figure out what is (or isn't) getting to your query and what is (or isn't) coming back.
It installs a doctype, macro partial file, template, and macro. All are called ezSearch so it shouldn't get in the way of anything else in your site. Though testing on a copy of the site is always prudent :)
It might be a way to compare if ex:Search() is doing what you think it's doing. ezSearch should return the same thing as you get via the Developer section.
I just tried making a simple page in a 7.3.0 site. I named the page "This is a test Chinese page" so I could find it easily in the index. It was in both the internal and external indexes immediately, as seen in the Examine Management.
Saved and published the page and again the characters were in the indexes as expected.
And ezSearch had no problem at all finding them as you'd expect. For that matter, any lucene/examine search would find them.
I wonder, do you have some odd Analyzer setting that is removing Chinese characters from your indexes? I'm using the default settings, which uses the WhitespaceAnalyzer.
I notice that you don't have a 'bodyText' field, which I do. I'm not an expert on the ex:Search() extension but I wonder... what fields in the index are being searched?
At least with ezSearch, you can (and should!) specify the fields to search in, though the default for ezSearch is the nodeName, metaKeywords, metaDescription, and bodyText fields. Which is why I was able to search for the foreign characters.
For your situation it looks as though those fields either don't exist or aren't relevant and that you'd want to specifically search in the metaKeywords, metaDescription, metaTitle, navigationTitle and header, at least given the data for these two nodes.
For a quick test... make a new node with a nodeName of something in Chinese. Then search for that. I bet it's found even though it's in Chinese because (I'm guessing but would be very surprised if it isn't the case) the nodeName is going to be searched by default.
Well, in the ExamineIndex.config right above the ExternalIndexSet I see this comment:
<!-- Default Indexset for external searches, this indexes all fields on all types of nodes-->
Which indicates to me that all fields are indexed if nothing else is specified.
But I actually have 1 page that's named something chinese in nodeName and this page I can find. So there seems to be something fishy.
99% of the chinese pages are english in nodeName due to URL's and then we specify the chinese name in another field (navigationTitle). So if I add navigationTitle the indexed fields my problems might go away.
So chinese nodes are only found if nodeName is in chinese.
And since 99% of the nodes are named in english, but in chinese in navigationTitle, they're not found even though navigationTitle is specified in the index settings.
That's the behaviour I thought you might find. And it's good news that the search is working in a predictable way, even if it isn't the way you want.
Given that you see the properties in the index (from a screenshot earlier in the thread) I don't think it is something that has to do with what is indexed. Rather, it has to do with what is searched.
This is implementation-specific. With ezSearch, we have a macro parameter that lets you specify which doctype properties to search within. I'm not sure how your installation does that. But that's where I'd look. I suspect you aren't actually searching within the navigationTitle and other important fields.
Index contains no chinese nodes in frontend
I have a website that consists of three languages, english, german and chinese.
On the website there is 1 index that handles all search requests.
My issue is that both english and german works fine, but chinese gives no hits. But the funny thing is, if I do the same search in backend, I get hits.
When I do the search in frontend I do like this:
Querystring looks like this: globalsearch?query=[chinese characters] (for some reason the forum removes the chinese characters)
And the output is
But the same query in backend in the ExternalSearcher gives me 3 hits.
Any logical explanation to this ?
Hi Sebastian,
If the
ex:Search()
extension is using the same search index etc. as the backend (I have no idea if that is in fact the case) - my first check would be to see if the Chinese nodes are in fact published? (XSLT can only "see" published content). Next up, I'd try to find out if the extension is written with limited encoding support (unintentionally, probably), but that's a little harder to figure out.../Chriztian
Hi, Sebastian,
Check the index itself to see if it has any of the Chinese in there. I suspect it doesn't. At least, not the index for the public website (rather than the backoffice... they are different indexes).
If it is there in the Internal index but not the External index you'll want to look at the settings for the indexer. These resources will help: https://our.umbraco.org/search?q=examine&cat=documentation
Let us know what you find out!
cheers,
doug.
Hi Doug
The internal searcher gives me 1 hit and the external searcher gives me 3 hits, when I search on the same chinese phrase.
But in frontend
ex:Search()
returns 0 hits, that's what really puzzles me.At the risk of offending our XSLT preferences... you might give ezSearch a try. It is like XSLTsearch only better because it uses Examine (and has limited media searching as well, if you want it).
https://our.umbraco.org/projects/website-utilities/ezsearch/
Easy to extend and debug with VisualStudio if you want to set a breakpoint and figure out what is (or isn't) getting to your query and what is (or isn't) coming back.
It installs a doctype, macro partial file, template, and macro. All are called ezSearch so it shouldn't get in the way of anything else in your site. Though testing on a copy of the site is always prudent :)
It might be a way to compare if
ex:Search()
is doing what you think it's doing. ezSearch should return the same thing as you get via the Developer section.cheers,
doug.
Finally I had the time to try out ezSearch.
Which unfortunately also returns 0 hits in the frontend.
So, right now it looks like that Umbraco doesn't support search on the chinese characters, out of the box.
Hi, Sebastian,
I just tried making a simple page in a 7.3.0 site. I named the page "This is a test Chinese page" so I could find it easily in the index. It was in both the internal and external indexes immediately, as seen in the Examine Management.
In the bodyText property of the page (a richtext editor in my case) I added a google translation of "This is a test" in both simplified and traditional Chinese, according to https://translate.google.co.uk/?ie=UTF-8&hl=en&client=tw-ob#en/zh-CN/This%20is%20a%20test
Saved and published the page and again the characters were in the indexes as expected.
And ezSearch had no problem at all finding them as you'd expect. For that matter, any lucene/examine search would find them.
I wonder, do you have some odd Analyzer setting that is removing Chinese characters from your indexes? I'm using the default settings, which uses the WhitespaceAnalyzer.
cheers,
doug.
Very wierd. I haven't changed anything in the config files, so they should be pretty default.
This is how it looks in frontend (attachment 1) In the textarea you can "see" the raw output of:
And backend (attachment 2)
Could I have a look at your config files, just to see if something is messed up ?
I notice that you don't have a 'bodyText' field, which I do. I'm not an expert on the ex:Search() extension but I wonder... what fields in the index are being searched?
At least with ezSearch, you can (and should!) specify the fields to search in, though the default for ezSearch is the nodeName, metaKeywords, metaDescription, and bodyText fields. Which is why I was able to search for the foreign characters.
For your situation it looks as though those fields either don't exist or aren't relevant and that you'd want to specifically search in the metaKeywords, metaDescription, metaTitle, navigationTitle and header, at least given the data for these two nodes.
For a quick test... make a new node with a nodeName of something in Chinese. Then search for that. I bet it's found even though it's in Chinese because (I'm guessing but would be very surprised if it isn't the case) the nodeName is going to be searched by default.
Let us know what you find out.
cheers,
doug.
Well, in the ExamineIndex.config right above the ExternalIndexSet I see this comment:
Which indicates to me that all fields are indexed if nothing else is specified.
But I actually have 1 page that's named something chinese in nodeName and this page I can find. So there seems to be something fishy.
99% of the chinese pages are english in nodeName due to URL's and then we specify the chinese name in another field (navigationTitle). So if I add navigationTitle the indexed fields my problems might go away.
Update
When searching with chinese characters this is the case:
My index is setup like this:
So chinese nodes are only found if nodeName is in chinese. And since 99% of the nodes are named in english, but in chinese in navigationTitle, they're not found even though navigationTitle is specified in the index settings.
That's the behaviour I thought you might find. And it's good news that the search is working in a predictable way, even if it isn't the way you want.
Given that you see the properties in the index (from a screenshot earlier in the thread) I don't think it is something that has to do with what is indexed. Rather, it has to do with what is searched.
This is implementation-specific. With ezSearch, we have a macro parameter that lets you specify which doctype properties to search within. I'm not sure how your installation does that. But that's where I'd look. I suspect you aren't actually searching within the navigationTitle and other important fields.
Some links that might be helpful if you haven't already seen them: https://github.com/Shazwazza/Examine https://github.com/Shazwazza/Examine/wiki https://our.umbraco.org/documentation/Reference/Searching/Examine/ https://our.umbraco.org/documentation/Reference/Searching/Examine/overview-explanation
That last link is probably the most important one.
Let us know what you find out!
cheers,
doug.
Sebastian,
Were you able to come up with a solution?
Alex
is working on a reply...