Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib
    Jun 16, 2020 @ 09:51
    Dhanesh Kumar MJ
    0

    Elastic search in Umbraco 8

    Hey guy's

    Is there any package or code sample for implementing elastic search for content and pdf?

    Regards Dhanesh:)

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 16, 2020 @ 10:20
    Ismail Mayat
    0

    Only for content see https://our.umbraco.com/packages/website-utilities/novicellexamineelasticsearch/ however you could tap into media save events and then inject pdf items into index?

    Regards

    Ismail

  • Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib
    Jun 16, 2020 @ 10:31
    Dhanesh Kumar MJ
    0

    Hey Ismail, Yes, but I mean this one https://www.elastic.co/elasticsearch/

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 16, 2020 @ 10:38
    Ismail Mayat
    0

    yeah you create account and index on that. then you use the elastic package to index the content and in the config point it to that.

  • Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib
    Jun 16, 2020 @ 10:40
    Dhanesh Kumar MJ
    0

    Oh,for this we can use this package https://our.umbraco.com/packages/website-utilities/novicellexamineelasticsearch/ for content right?

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 16, 2020 @ 10:47
    Ismail Mayat
    0

    Yeah so you dont have to write all the code to get the umbraco content into elastic. You also dont have to write the client to query it either.

    Basically we have examine which is the search and indexing in umbraco. It wraps around lucene.net. Examine is extensible.

    The package link i sent you is an examine provider for elastic search so instead of the engine under being lucene its now elastic (although elastic is powered also by lucene). There is also an azure search provider examinex although that one is paid.

    The elastic one only does content and media however for the media it only does stub information like filename size extension not the actual content of the media. So in theory you can use examine events and test is current item being indexed media item, if it is then test is it pdf and if it is extract the pdf lib of your choice then inject the extracted content in. That way you can get actual pdf content.

    Regards

    Ismial

  • Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib
    Jun 16, 2020 @ 10:50
    Dhanesh Kumar MJ
    0

    Great man!, Thanks for the explanation.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 16, 2020 @ 10:55
    Ismail Mayat
    0

    I have done pdf extraction before see https://our.umbraco.com/packages/website-utilities/cogumbracoexaminemediaindexer/ you could look at the code for this and the libs i used and then use that.

    I did create a composition for examine pdf indexer which uses textsharp and i swapped out the textsharp engine with apache tika. Apache tika can extract most file formats, its a bit on the heavy side as its written in java and uses IKVM but it works really well. See https://www.nuget.org/packages/TikaOnDotNet/

  • Nijaz Hameed 38 posts 173 karma points
    Aug 03, 2020 @ 08:49
    Nijaz Hameed
    0

    @Ismail Mayat PDFsharp does the same job for the PDF files right? Is it good to use the PDFsharp instead of apache tika if i only needed to extract data from PDF files?

  • Dhanesh Kumar MJ 167 posts 543 karma points MVP c-trib
    Jun 16, 2020 @ 10:56
    Dhanesh Kumar MJ
    0

    Great! 🤘🏻🤘🏻

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies