elastic search in umbraco 8

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib

Jun 16, 2020 @ 09:51

0

Elastic search in Umbraco 8

Hey guy's

Is there any package or code sample for implementing elastic search for content and pdf?

Regards Dhanesh:)

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jun 16, 2020 @ 10:20

0

Only for content see https://our.umbraco.com/packages/website-utilities/novicellexamineelasticsearch/ however you could tap into media save events and then inject pdf items into index?

Regards

Ismail

Copy Link
Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib

Jun 16, 2020 @ 10:31

0

Hey Ismail, Yes, but I mean this one https://www.elastic.co/elasticsearch/

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jun 16, 2020 @ 10:38

0

yeah you create account and index on that. then you use the elastic package to index the content and in the config point it to that.

Copy Link
Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib

Jun 16, 2020 @ 10:40

0

Oh,for this we can use this package https://our.umbraco.com/packages/website-utilities/novicellexamineelasticsearch/ for content right?

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jun 16, 2020 @ 10:47

0

Yeah so you dont have to write all the code to get the umbraco content into elastic. You also dont have to write the client to query it either.

Basically we have examine which is the search and indexing in umbraco. It wraps around lucene.net. Examine is extensible.

The package link i sent you is an examine provider for elastic search so instead of the engine under being lucene its now elastic (although elastic is powered also by lucene). There is also an azure search provider examinex although that one is paid.

The elastic one only does content and media however for the media it only does stub information like filename size extension not the actual content of the media. So in theory you can use examine events and test is current item being indexed media item, if it is then test is it pdf and if it is extract the pdf lib of your choice then inject the extracted content in. That way you can get actual pdf content.

Regards

Ismial

Copy Link
Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib

Jun 16, 2020 @ 10:50

0

Great man!, Thanks for the explanation.

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jun 16, 2020 @ 10:55

0

I have done pdf extraction before see https://our.umbraco.com/packages/website-utilities/cogumbracoexaminemediaindexer/ you could look at the code for this and the libs i used and then use that.

I did create a composition for examine pdf indexer which uses textsharp and i swapped out the textsharp engine with apache tika. Apache tika can extract most file formats, its a bit on the heavy side as its written in java and uses IKVM but it works really well. See https://www.nuget.org/packages/TikaOnDotNet/

Copy Link
Nijaz Hameed 38 posts 173 karma points

Aug 03, 2020 @ 08:49

0

@Ismail Mayat PDFsharp does the same job for the PDF files right? Is it good to use the PDFsharp instead of apache tika if i only needed to extract data from PDF files?

Copy Link
Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib

Jun 16, 2020 @ 10:56

0

Great! 🤘🏻🤘🏻

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Elastic search in Umbraco 8