Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Andy Welch 3 posts 73 karma points
    Nov 26, 2019 @ 19:02
    Andy Welch
    0

    Examine MultiSearcher and PDFIndex not working

    Hi, I've followed the docs and implemented PDFIndex and MultiSearcher.

    In the backend these are both healthy and return results.

    However, when I use them in code I get zero results from the PdfIndex either directly or via the MultiSearcher.

    This is the code which creates the PdfIndex and MultiSearcher...

    public void Initialize()
            {
                //Get both the external and pdf index
                if (_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out var externalIndex)
                    && _examineManager.TryGetIndex(PdfIndexConstants.PdfIndexName, out var pdfIndex))
                {
                    //register a multi searcher for both of them
                    var multiSearcher = new MultiIndexSearcher("MultiSearcher", new IIndex[] { externalIndex, pdfIndex });
                    _examineManager.AddSearcher(multiSearcher);
                }
            }
    

    This is how I'm searching the PdfIndex directly...

    var textFields = new[]
                {
                    "title", "description", "content", "bodyText", "location", "pageHeading", "subHeading",
                    "nodeName", "__NodeTypeAlias"
                };
    
    if (ExamineManager.Instance.TryGetIndex("PDFIndex", out index))
                    {
                        searcher = index.GetSearcher(); ;
    
                        var query = searcher.CreateQuery("media").GroupedOr(textFields, searchQuery.Fuzzy(0.2f));
                        results = query.Execute();
                    }
    

    ...and this is a search against the MultiSearcher

    if (_searchPdf && ExamineManager.Instance.TryGetSearcher("MultiSearcher", out searcher))
                    {
                        var query = searcher.CreateQuery("media,content").GroupedOr(textFields, searchQuery/*.Fuzzy(0.2f)*/);
                        results = query.Execute();
                    }
    

    I suspect I'm phrasing the search incorrectly. I'll continue to "fiddle" and strip the code down to minimum.

    Suggestions appreciated.

  • Andy Welch 3 posts 73 karma points
    Nov 26, 2019 @ 20:14
    Andy Welch
    0

    Follow up on my experiments...

    Based on the fields I see in results when testing the PdfIndex in backoffice, I've tried the following textFields variation...

    var PdfTextFields = new[]
    {
           "nodeName", "fileTextContent"
     };
    

    still, sadly, with zero results.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Nov 27, 2019 @ 10:33
    Ismail Mayat
    0

    Andy,

    I have multi searcher over 3 indexes including pdfindex working. I suspect its your query, try getting rid of "media,content" bit also in your textfields you do not have fileTextContent which is where it stores the extracted content:

    enter image description here

    Also can you do query.ToString() and report back the actual generated lucene query.

  • Andy Welch 3 posts 73 karma points
    Nov 27, 2019 @ 20:15
    Andy Welch
    0

    Thanks for your help Ismail, it's really appreciated. I'll need a couple days to get back to you.

Please Sign in or register to post replies

Write your reply to:

Draft