problems with pdfindexer pdfsearcher

Eric Patton 21 posts 51 karma points

Nov 06, 2012 @ 18:14

Problems with PDFIndexer / PDFSearcher

Hey guys! I've been searching for a solution to this for a couple of days now. I have a normal site search set up, and it's working perfectly. Our client decided that they wanted to be able to search PDFs as well, so I went to the Examine documentation and found that there's one built right in! I set up the index set, indexer, and searcher in the ExamineIndex.config and ExamineSettings.config:

<IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/ExamineIndexes/PDFIndexSet/"/>
<add name="PDFIndexer" 
            type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
            extensions=".pdf"
            umbracoFileProperty="umbracoFile"/>
<add name="PDFSearcher"
           type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />

I uploaded a couple of test PDFs to the Media, and republished the site to try to force an index to happen. No folder was created though for the PDFIndexSet. Not only that, but my normal site search broke after just adding the above code to the config files. It throws these exceptions:

Error Loading Razor Script (file: Render Search Results) The type initializer for 'Examine.ExamineManager' threw an exception.    at Examine.ExamineManager.get_Instance()
  at ASP._Page_macroScripts_RenderSearchResults_cshtml.Execute() in c:\inetpub\wwwroot\mdesumbraco\macroScripts\RenderSearchResults.cshtml:line 134
  at System.Web.WebPages.WebPageBase.ExecutePageHierarchy()
  at System.Web.WebPages.WebPage.ExecutePageHierarchy()
  at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage)
  at umbraco.MacroEngines.RazorMacroEngine.ExecuteRazor(MacroModel macro, INode currentPage)
  at umbraco.MacroEngines.RazorMacroEngine.Execute(MacroModel macro, INode currentPage)

Error loading MacroEngine script (file: RenderSearchResults.cshtml, Type: ''
The type initializer for 'Examine.ExamineManager' threw an exception.
  at umbraco.macro.renderMacro(Hashtable pageElements, Int32 pageId)

The line of code it is complaining about is here:

var searcher = ExamineManager.Instance.SearchProviderCollection["MdesSearcher"];

Once I comment out the PDF code from the config files, everything works fine and dandy. I tried rebuilding my internal index, but that ended up breaking the site completely as it wouldn't rebuild for some reason. All of my pages showed macro errors saying they couldn't find the files. Luckily, I had backed up the site before doing all of this, so I was able to restore it.

Does anyone have any idea why PDF search isn't working for me? I'm using 4.10 (the release right before the RC because the client wants 4.10 and that was all I had to start working with at the time) Thanks so much!

Copy Link

Eric Patton 21 posts 51 karma points

Nov 06, 2012 @ 20:13

Hey all, I've actually solved this one myself. It turns out that the UmbracoExamine.PDF dll just wasn't included in the 4.10 build that I have. I downloaded it from here: from http://examine.codeplex.com/releases/view/77362 from the "Umbraco Examine PDF binaries" link and placed it in my bin folder for umbraco.

And just in case anyone is wondering what properties Examine returns from the indexing, I was able to get the text from the PDFs and the node ID from these two respectively:

FileTextContent
__NodeId (yes, that's two underscores)

Copy Link

d Thomas 13 posts 33 karma points

Mar 25, 2013 @ 10:30

@ Eric Patton

This worked great for indexing and searching, retrieving the results aswell. the only thing I need to achieve is a link to the media library where the douments are.

Here is my razor script, would you be able to help with this?

@using Examine
@using Examine.SearchCriteria
@using UmbracoExamine
@using UmbracoExamine.PDF
@using umbraco.MacroEngines
@using System.Xml.Linq
@using umbraco.presentation.nodeFactory
@using umbraco.cms
@using umbraco.cms.businesslogic.media;
@inherits umbraco.MacroEngines.DynamicNodeContext

@{
    var searchString = Request["searchstring"];
    if (!String.IsNullOrEmpty(searchString))
{
    var searchResults = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].Search(searchString, true);
    foreach (var c in searchResults)
    {
    <div>

<p><a href="http://localhost:22841/Media/@c.Fields[@"__NodeId"]">@c.Fields[@"__NodeId"]</a></p>

        @if(c.Fields.Keys.Contains("FileTextContent"))
        {
          var bodyText = c.Fields["FileTextContent"];
          <p>@Html.Raw(bodyText.Substring(0,100))</p>
        }
      </div>

    }

}

}

Copy Link

AJ 13 posts 72 karma points

Jul 10, 2015 @ 19:38

I just setup a UmbracoExamine.Pdf. Indexing and searching is working but when I try to search for something like "folder and archive" or any multiple words present in the pdf document, it doesn't show any result.

Any idea on what am I missing?

Thanks in advance.

Copy Link

Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Feb 03, 2016 @ 15:47

On your query do a grouped or eg

var terms = queryTerm.Split(' ').ForEach(x=>x.MultipleCharacterWildcard());     var fields = new List<string> {"contents"};
        IBooleanOperation query = searchCritera.GroupedOr(fields, terms);

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Problems with PDFIndexer / PDFSearcher