howto add pdf searching to ezsearch

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Ben Norman 167 posts 276 karma points

Oct 09, 2015 @ 03:03

Howto: Add PDF Searching to ezSearch

Install-Package UmbracoCms.UmbracoExamine.PDF

Then update /Views/MacroPartials/ezSearch.cshtml around line 145

model.AllResults = results;
model.AllResults = model.AllResults.Concat(searcherPdf.Search(model.SearchTerm, true).Where(i => !model.AllResults.Any(j => j.Id == i.Id)));
model.AllResults = model.AllResults.OrderByDescending(i => i.Score).ToList();

then around line 230

       case UmbracoExamine.IndexTypes.Media:
            var mediaItem = Umbraco.TypedMedia(result.Id);
            @RenderMediaResult(model, mediaItem)
            break;

Thanks for the package Matt! BTW, buy a licence for itext if your project is not open source!

Copy Link

Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Oct 09, 2015 @ 07:05

0

Ben,

I looks like you are doing 2 lots of searches then concatonating the results then ordering by score.

However with examine you can do multi index searches so you can search over more than one index and its already sorted by score. Will be alot quicker than doing 2 searches then doing in memory merge and sort using linq.

See https://our.umbraco.org/forum/core/general/20321-Implementing-Multi-Index-Search-in-470 on how to create multi index searcher.

Regards

Ismail

Copy Link
keilo 568 posts 1023 karma points

Oct 09, 2015 @ 07:15
0
Hi Ismail

Do you use the new UmbracoExamine.PDF package or the older/alternate index?

The example given in linked post refers as PDFIndexSet, not sure how better the UmbracoExamine.PDF is
```
 add name="ContentSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, MultiIndexSearcher"
             analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true"  indexSets="ContentIndexSet,PDFIndexSet" 
```
On related note Ben's modification for the Content/Media line, I was challenged with question on how to limit the result for logged in users. I believe this aspect is still not mainstream and there is no easy way to accomplish returning results meant for logged in users.
Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Oct 09, 2015 @ 07:55

0

PDFIndexSet is just the name of the index is uses UmbracoExamine.pdf. You can name it to whatever you like.

Regards

Ismail

Copy Link
Ben Norman 167 posts 276 karma points

Oct 10, 2015 @ 01:28

0

@keilo the ezSearch package does appear to take care of the security issue. My extra code is deliberately not taking care of that :( but you could copy the code that is taking care of it that is already in ezsearch thanks to Matt.

@ismail I did read and try the info contained in the blog post you sent over. Thankyou for that. Unfortunately i could not get it to return results. I got it configured but could not return results :( :( It would have been great to get it working by changing configuration only instead of having to change the code as I ended up doing but thats how it goes sometimes.

I will have another go at it on the next project that asks for pdf indexing. Hopefully by then we have swapped out itext from the pdf indexing for something that is free as well.

Copy Link
Ben Norman 167 posts 276 karma points

Nov 05, 2015 @ 22:48

0

I sent over a pull request for pdf indexing using a free pdf open upperer. if you need that then look for it on the pull requests for the umbraco PDF Indexer

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

Apr 28, 2016 @ 18:16

0

can someone please provide an example of how this is set up in version 7.4 and v.latest of eZsearch... can't figure out how to get it to search multiple indexes :(

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Apr 29, 2016 @ 14:08

0

bob,

See https://our.umbraco.org/forum/developers/api-questions/44289-Examine-Multisearcher there is config there on how to setup multisearcher so you will need to maybe change config and add extra extra indexes.

regards

Ismail

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

Apr 29, 2016 @ 14:13

0

Ismail, thanks for replying! however, i am not following/seeing anything that looks like a multi-searcher in that thread you linked to? sorry for being thick.

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Apr 29, 2016 @ 15:17

1

bob,

get on skype im on ismail_mayat we probably just need to tweak some config files

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

Apr 29, 2016 @ 16:55

1

I spent a lovely hour with my dear friend Ismail on skype a bit earlier today and we were able to work through how to get this all set up. We commented out some bits from the ezSearch partial, but we were able to effectively use ezSearch to use the external indexer and the pdf indexer. You can see the files that we tweaked here: https://gist.github.com/rbaty-barr/8448d1384abb0cfcc6b327e3a9804e58

i will be expanding this a bit more with an attempt to pump some archetype data into the index as well. Once I stumble around and get a working prototype of that, i will share my updated file as well.

Huge thanks to Ismail and Doug Robar on this. Doug and I poked around a bit yesterday as well. #h5yr

Copy Link
Claushingebjerg 939 posts 2574 karma points

May 13, 2016 @ 12:07

0

Awesome, that you're tinkering Bob and Ismail. Any chance you could fix the preview of "Grid" content as well? So it displays actual content instead of json markup?

Pretty pretty please with sugar on top :)

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

May 13, 2016 @ 13:46

0

I believe Bob was doing something for this already? @bob??

Copy Link
Claushingebjerg 939 posts 2574 karma points

May 13, 2016 @ 13:48

0

That would earn @bob a lot of beer at Codegarden!

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

May 13, 2016 @ 13:50

0

my script handles Archetype right now... have not really focused on the grid to this point. However, the concept would be similar to custom grid output. @Claus, have you done any custom grid output? if so, just follow the same principles in ezSearch output.

also, please accept my apologies if you are like me and not really a dev, but pretend to be one from 9-5 ;)

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

May 13, 2016 @ 13:54

0

@claus see https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/76413-examine-indexing-rich-and-complex-property-editors

you can nick that code and modify in app_start the code so for grid elements do a http get to get the rendered content and inject into the index.

Regards

Ismail

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

Feb 13, 2017 @ 16:32

0

interestingly, this method is not working for me any longer... for some reason, it is only bringing back the pdf files. If i remove the multisearcher, i can get the content results, but not when the multiIndexSearcher is wired. Any new thoughts on this topic?

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Feb 14, 2017 @ 08:35

0

Bob,

You tried rebuilding your indexes? Does this happen on all envs or just production?

Maybe your main index has gone down or is not fully rebuilt.

Regards

Ismail

Copy Link
bob baty-barr 1180 posts 1294 karma points MVP

Feb 14, 2017 @ 13:22

0

Ismail, it isn't that is stopped working on existing sites, it is that the same approach is not working on new sites. I feel like something has changed with the multisearcher - see the gist posted higher in thread from me.

if i use multi, i only get pdfs, if i don't use multi i get all the umbraco nodes, but not pdfs.

any thoughts?

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies