Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Sean Dooley 288 posts 527 karma points
    Dec 20, 2017 @ 08:59
    Sean Dooley
    1

    Examine filtered search for exact phrase

    I'm using Examine with a custom data set from an external source, and I'm trying to query a category field to find an exact phrase (which happens to include a stop word.

    Below is my Examine configuration

    ExamineSettings.config

    <Examine>
        <ExamineIndexProviders>
            <providers>
                ...
                <add name="CustomIndexer" 
                    type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine"
                    dataService="Custom.Indexers.CustomDataService, Custom"
                    indexTypes="CustomContent" 
                    runAsync="true" />
                ...
            </providers>
        </ExamineIndexProviders>
        <ExamineSearchProviders defaultProvider="ExternalSearcher">
            ...
            <add name="CustomSearcher"
                type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine"
                analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
                enableLeadingWildcard="true"
                indexSets="CustomIndexSet" />
            ...
        </ExamineSearchProviders>
    </Examine>
    

    ExamineIndex.config

    <ExamineLuceneIndexSets>
        ...
        <IndexSet SetName="CustomIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Custom/">
            <IndexUserFields>
                <add Name="id" />
                <add Name="name" />
                <add Name="title" />
                <add Name="url" />
                <add Name="created" />
                <add Name="modified" />
                <add Name="_brand" />
                <add Name="brand" />
                <add Name="_category" />
                <add Name="category" />
            </IndexUserFields>
        </IndexSet>
        ...
    </ExamineLuceneIndexSets>
    

    In my ExamineLuceneIndexSet I've created two fields, _category which stores the cateogies as a comma-separated list, and category which stores the categories replacing the comma with a space.

    I'm then running a filtered search whereby the user can search by a combination of term (text input), brand (checkbox list: "Mornobit", "Durndrop") and/or category (checkbox list: "Bug and Tar", "Headlights", "Scratch Repair").

    Below is the code that I'm currently using.

    var term = Request.QueryString["term"];
    var brand = Request.QueryString["brand"];
    var category = Request.QueryString["category"];
    
    IBooleanOperation filter = null;
    var searchProvider = "CustomSearcher";
    var searchProviderCollection = ExamineManager.Instance.SearchProviderCollection[searchProvider];
    var searchCriteria = searchProviderCollection.CreateSearchCriteria(BooleanOperation.Or);
    
    if (!string.IsNullOrEmpty(term))
    {
        filter = searchCriteria.GroupedOr(new[] { "name", "title", "keywords", "taxKeyword" }, term.Boost(10.0f));
    }
    
    if (!string.IsNullOrEmpty(brand))
    {
        filter = filter == null
            ? searchCriteria.GroupedOr(new[] { "brand" }, brand.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries))
            : filter.And().GroupedOr(new[] { "brand" }, brand.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries));
    }
    
    if (!string.IsNullOrEmpty(category))
    {
        filter = filter == null
            ? searchCriteria.GroupedOr(new[] { "category" }, brand.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries))
            : filter.And().GroupedOr(new[] { "category" }, brand.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries));
    }
    
    var results = filter == null
        ? searchProviderCollection.Search("*", true)
            .OrderByDescending(x => x.Score)
            .TakeWhile(x => x.Score > minimumScore)
            .ToList()
        : searchProviderCollection
            .Search(filter.Compile())
            .OrderByDescending(x => x.Score)
            .TakeWhile(x => x.Score > minimumScore)
            .ToList();
    

    Using the above code I'm getting the following results using various analyzers

    StandardAnalyzer (this removes stop words)

    • {{ SearchIndexType: , LuceneQuery: (category:"bug tar") }}
    • No results

    SimpleAnalyzer

    • {{ SearchIndexType: , LuceneQuery: (category:"bug and tar") }}
    • No results

    WhitespaceAnalyzer

    • {{ SearchIndexType: , LuceneQuery: (category:"Bug and Tar") }}
    • No results

    Ideally when I search for "Bug and Tar", I need to find that exact phrase which I know exists as when querying in the Examine Management dashboard I can see _category:Solutions,Bug and Tar category: Solutions Bug and Tar for some entries.

    I'd really appreciate any advice on what I need to do to get this search to work.

    UPDATE

    After playing around with various settings, I updated my CustomSearcher as follows

    <add name="CustomSearcher"
                type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine"
                enableLeadingWildcard="true"
                indexSets="CustomIndexSet" />
    

    Now the search looks like

    • {{ SearchIndexType: , LuceneQuery: +(category:"bug ? tar" category:bug category:tar) }}
    • Returns results

    I still need to do further testing with various filter combinations, but I feel that I have stumbled upon a possible solution.

  • Ravi Motha 290 posts 500 karma points MVP 7x c-trib
    Dec 20, 2017 @ 11:42
    Ravi Motha
    0

    Have you played around in the back office examine indexer tool to see if you can see results based on your search or variations of your search , ie making sure the index is built and has that value..

    Ravi

  • Sean Dooley 288 posts 527 karma points
    Dec 20, 2017 @ 21:45
    Sean Dooley
    0

    Yes Ravi, I used Examine Management in the back office to confirm the values are present, and tested.

  • Tom Steer 161 posts 596 karma points
    Dec 20, 2017 @ 12:05
    Tom Steer
    0

    Hey Sean,

    If you are passing the exact category values to the search anyway then I would say the easiest thing to do would be to, insert a field for each category value during DocumentWriting and set it to be not analyzed. Something like this:

    private void MyIndexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
    {
        if (e.Fields.ContainsKey("category"))
        {
            var facetValue = e.Fields["category"];
            var values = facetValue.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries).ToList();
            foreach (var value in values)
            {
                e.Document.Add(new Field("categoryNonAnalyzed", value.Trim(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            }
        }
    }
    

    Then you can use the categoryNonAnalyzed field to search on.

  • Sean Dooley 288 posts 527 karma points
    Dec 20, 2017 @ 22:27
    Sean Dooley
    0

    Hey Tom

    I've setup the following

    private static void CustomIndexer_DocumentWriting(object sender, Examine.LuceneEngine.DocumentWritingEventArgs e)
            {
                var values = e.Fields["_category"].Split(new[] {","}, StringSplitOptions.RemoveEmptyEntries).ToList();
                foreach (var value in values)
                {
                    e.Document.Add(new Field("categoryList", value.Trim(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                }
            }
    

    I know one of the items has two categories selected, "Headlights" and "Bug and Tar". When using the Examine Management dashboard, I'm seeing the following

    categoryList: Headlights

    Does the field name need to be unique for each value?

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Dec 20, 2017 @ 12:07
    Ismail Mayat
    0

    Sean,

    Fire up luke or even in the backoffice try the following lucene query:

    category:bug category:tar
    

    You are doing a phrase search with

    category:"bug tar"
    

    and in your index you have bug and tar stored although the stored tokens will be bug tar when using standard analyser.

    Another thing to not sure why you are ordering by score when its already ordered by score. Also does searchProviderCollection.Search("*", true) do anything first time I have seen that.

    Regards

    Ismail

  • Sean Dooley 288 posts 527 karma points
    Dec 20, 2017 @ 22:00
    Sean Dooley
    0

    I was getting an error when no term, brand or category had been supplied so after playing around for a bit I found that I could perform an search to grab everything by using searchProviderCollection.Search("*", true).

    Not sure if this is the best practice, but it is giving the expected results.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Dec 20, 2017 @ 12:08
    Ismail Mayat
    0

    Following on from Toms suggestion see https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/89730-multi-value-index-search-on-tag-field for multi value fields which is what you have effectively have.

  • Sean Dooley 288 posts 527 karma points
    Dec 20, 2017 @ 22:08
    Sean Dooley
    0

    Thanks Ismail. I'm assuming that the DocumentWriting event is available for a custom data set. I'll have a play around and see what I can come up with.

Please Sign in or register to post replies

Write your reply to:

Draft