Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Connie DeCinko 931 posts 1159 karma points
    Feb 17, 2017 @ 16:52
    Connie DeCinko
    0

    How to ignore small words such as THE

    Is there a way I can exclude or ignore small words such as A or THE? Since nearly every page has at least one occurrence of the word THE, it will come back in the results for "By the Sea".

  • Douglas Robar 3570 posts 4671 karma points MVP ∞ admin c-trib
    Feb 17, 2017 @ 17:15
    Douglas Robar
    0

    Hi, Connie,

    A few ideas here. First, if you want a phrase include it in quotes (single or double quotes, doesn't matter). That way you aren't searching for each word but for the phrase. Any words not in quotes are treated as individual words, as you've found.

    Thus, searching for by the sea is really searching for documents that contain all three words. Whereas searching for "by the sea" is searching for documents that contain that specific phrase.

    If that's not sufficient then you could alter XSLTsearch to include stop words. This was something on my TODO list but I never got around to it and few have asked for it. It is a concept built in to ezSearch and it's Lucene-based Examine indexes. So you may find that switching from XSLTsearch to ezSearch resolves the problem all by itself. ezSearch is even faster than XSLTsearch and is written in Razor rather than XSLT so if you do want to modify it you may find it easier if you've a working knowledge of Razor/C# since few these days know much about XSLT.

    If you did want to implement a stop words list in XSLTsearch I'd create a new function in the xsltsearch.cs file and call it in the xsltsearch.xslt file where the variable named unescapedSearch is set near the top of the file.

    Basically, your function would remove any search terms from the list of search terms that you didn't want. Such as 'a', 'the', 'an', 'of', etc etc etc. But only if they were individual words and not part of a phrase.

    Hope this helps point you in the direction that will give you the best results.

    cheers,
    doug.

  • Connie DeCinko 931 posts 1159 karma points
    Feb 17, 2017 @ 17:46
    Connie DeCinko
    0

    Doug, thanks for the quick response. We have plans to swap out the search for the newer tech but I was told to see if I can “fix” the XSLTSearch for now.

    Also, I’m thinking in an ideal world, you first treat the entered words automatically as a phrase – “CLE by the Sea” putting those results at the top of the list and then search for any page with those words anywhere on the page, putting those results last on the list. So something like “trust wills death”, that phrase will find nothing but it will find something just looking for the individual words.

    It means running the search twice and combine the results. When I have some free time, I may play with it.

    Thanks!

  • Douglas Robar 3570 posts 4671 karma points MVP ∞ admin c-trib
    Feb 17, 2017 @ 19:10
    Douglas Robar
    0

    Hi, Connie,

    Sure, you could do the double search as you mention. It'll make the search take twice as long but if you don't have a massive site that isn't going to be a problem because you'll still be in sub-second response times.

    Here's a crazy idea... I wonder if it might be sneakily easy to do. Could you have the search results page have two XSLTsearch macros on it? The first one would put quotes around all the search terms. And the second would use the regular results.

    You'd get two results to the page: Exact matches and Matches all terms. You wouldn't need to modify XSLTsearch at all in that case, nor merge the results into a common list. Just two different parts of the response page.

    Not entirely elegant but... might serve the immediate need?

    Let us know how you finally solve your need, for the benefit of the next person (and because I'm super-curious what you come up with)!

    cheers,
    doug.

Please Sign in or register to post replies

Write your reply to:

Draft