Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Eric Patton 21 posts 51 karma points
    Nov 06, 2012 @ 18:14
    Eric Patton
    0

    Problems with PDFIndexer / PDFSearcher

    Hey guys! I've been searching for a solution to this for a couple of days now. I have a normal site search set up, and it's working perfectly. Our client decided that they wanted to be able to search PDFs as well, so I went to the Examine documentation and found that there's one built right in! I set up the index set, indexer, and searcher in the ExamineIndex.config and ExamineSettings.config:

    <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/ExamineIndexes/PDFIndexSet/"/>
    <add name="PDFIndexer" 
                type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
                extensions=".pdf"
                umbracoFileProperty="umbracoFile"/>
    <add name="PDFSearcher"
               type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />

     

     

    I uploaded a couple of test PDFs to the Media, and republished the site to try to force an index to happen. No folder was created though for the PDFIndexSet. Not only that, but my normal site search broke after just adding the above code to the config files. It throws these exceptions:

     

    Error Loading Razor Script (file: Render Search Results) The type initializer for 'Examine.ExamineManager' threw an exception.    at Examine.ExamineManager.get_Instance()
      at ASP._Page_macroScripts_RenderSearchResults_cshtml.Execute() in c:\inetpub\wwwroot\mdesumbraco\macroScripts\RenderSearchResults.cshtml:line 134
      at System.Web.WebPages.WebPageBase.ExecutePageHierarchy()
      at System.Web.WebPages.WebPage.ExecutePageHierarchy()
      at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage)
      at umbraco.MacroEngines.RazorMacroEngine.ExecuteRazor(MacroModel macro, INode currentPage)
      at umbraco.MacroEngines.RazorMacroEngine.Execute(MacroModel macro, INode currentPage)

     

     

     

     

    Error loading MacroEngine script (file: RenderSearchResults.cshtml, Type: ''
    The type initializer for 'Examine.ExamineManager' threw an exception.
      at umbraco.macro.renderMacro(Hashtable pageElements, Int32 pageId)

     

    The line of code it is complaining about is here:

    var searcher = ExamineManager.Instance.SearchProviderCollection["MdesSearcher"];

    Once I comment out the PDF code from the config files, everything works fine and dandy. I tried rebuilding my internal index, but that ended up breaking the site completely as it wouldn't rebuild for some reason. All of my pages showed macro errors saying they couldn't find the files. Luckily, I had backed up the site before doing all of this, so I was able to restore it.

    Does anyone have any idea why PDF search isn't working for me? I'm using 4.10 (the release right before the RC because the client wants 4.10 and that was all I had to start working with at the time) Thanks so much!

     

  • Eric Patton 21 posts 51 karma points
    Nov 06, 2012 @ 20:13
    Eric Patton
    0

    Hey all, I've actually solved this one myself. It turns out that the UmbracoExamine.PDF dll just wasn't included in the 4.10 build that I have. I downloaded it from here: from http://examine.codeplex.com/releases/view/77362 from the "Umbraco Examine PDF binaries" link and placed it in my bin folder for umbraco.

    And just in case anyone is wondering what properties Examine returns from the indexing, I was able to get the text from the PDFs and the node ID from these two respectively: 

    • FileTextContent
    • __NodeId (yes, that's two underscores)
  • d Thomas 13 posts 33 karma points
    Mar 25, 2013 @ 10:30
    d Thomas
    0

    @ Eric Patton

     

    This worked great for indexing and searching, retrieving the results aswell. the only thing I need to achieve is a link to the media library where the douments are.

    Here is my razor script, would  you be able to help with this?

    @using Examine
    @using Examine.SearchCriteria
    @using UmbracoExamine
    @using UmbracoExamine.PDF
    @using umbraco.MacroEngines
    @using System.Xml.Linq
    @using umbraco.presentation.nodeFactory
    @using umbraco.cms
    @using umbraco.cms.businesslogic.media;
    @inherits umbraco.MacroEngines.DynamicNodeContext

    @{
          var searchString = Request["searchstring"];
          if (!String.IsNullOrEmpty(searchString))
      {         
        var searchResults = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].Search(searchString, true);
        foreach (var c in searchResults)
        {   
        <div>
           
    <p><a href="http://localhost:22841/Media/@c.Fields[@"__NodeId"]">@c.Fields[@"__NodeId"]</a></p>
            
           
        

           
            @if(c.Fields.Keys.Contains("FileTextContent"))
            {
              var bodyText = c.Fields["FileTextContent"];
              <p>@Html.Raw(bodyText.Substring(0,100))</p>
            }
          </div>
             
        }
      
      }

    }

     

     

     

  • AJ 13 posts 72 karma points
    Jul 10, 2015 @ 19:38
    AJ
    0

    I just setup a UmbracoExamine.Pdf. Indexing and searching is working but when I try to search for something like "folder and archive" or any multiple words present in the pdf document, it doesn't show any result.

    Any idea on what am I missing?

    Thanks in advance.

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 03, 2016 @ 15:47
    Ismail Mayat
    0

    On your query do a grouped or eg

    var terms = queryTerm.Split(' ').ForEach(x=>x.MultipleCharacterWildcard());     var fields = new List<string> {"contents"};
            IBooleanOperation query = searchCritera.GroupedOr(fields, terms);
    
Please Sign in or register to post replies

Write your reply to:

Draft