Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Tom 161 posts 322 karma points
    Oct 09, 2018 @ 17:29
    Tom
    0

    UmbrcoExamie for Office Documents

    I am on Umbraco 7.5.9. I did not find a NuGet package for MS Office documents (Word & Excel). The only one I found was for PDF.

    Does anyone have a C# example of how to index\search office documents?

    Thanks

    Tom

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Oct 11, 2018 @ 07:22
    Ismail Mayat
    2

    Tom,

    You can use https://our.umbraco.com/packages/backoffice-extensions/examinefileindexer/ it uses apache tika under the hood and will index office documents.

    With regards to search are you looking to do combined search with content and media or just media? When you install the package it will create indexer and searcher for you and you can use that searcher but it will only be on the media.

    If you want combined search you will have to look up how to do multi index search.

    Regards

    Ismail

  • Tom 161 posts 322 karma points
    Oct 12, 2018 @ 13:24
    Tom
    0

    OK that worked. YOU rock.

    One more question.

    Do you know if there is a way to extract PDF meta data into search engine? I would like to add PDF Title, Author, description, create date to indexer.

    Thanks

    Tom

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Oct 12, 2018 @ 22:20
    Ismail Mayat
    1

    Tom,

    It should by default extract it, check the index with Luke you should see all associated meta data.

    Regards

    Ismail

  • Tom 161 posts 322 karma points
    Oct 25, 2018 @ 16:42
    Tom
    0

    Thank you . It does. I appreciate your comments.

    Got another question. I uploaded a pdf to my anonymous site and a different pdf to my secure site's media folder. On my secure site when users login using Active Directory and OWIN, I am not geting any search results for my new pdf document but I am getting search results for the pdf in question for the anonymous site.

    Would you have any suggestions on why this is so? Note: I am using the following nuget packages. Cogworks.ExamineFileIndexer UmbracoCms.UmbracoExamine.PDF

    Thanks

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Oct 25, 2018 @ 16:45
    Ismail Mayat
    0

    Tom,

    How are you securing the pdfs in media section you using media protect? Also try searching for the item in the index itself using luke. Also not sure why you are using both packages as my one also does pdf.

    Regards

    Ismail

  • Tom 161 posts 322 karma points
    Oct 26, 2018 @ 11:51
    Tom
    0

    Ismail:

    Thanks. I figured it out. In the ExamineIndex.config file I have IndexSets defined. There is an

    So thanks for responding.

  • Tom 161 posts 322 karma points
    Oct 26, 2018 @ 11:52
    Tom
    0

    Ismail:

    Did you know that Umbraco Examine does not index any content in Macros?

    If you know of a way to do this, please let me know.

    Thanks

    Tom

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Oct 26, 2018 @ 12:09
    Ismail Mayat
    2

    Yup I did and I screen scrap in gathering node to get rendered content

  • Tom 161 posts 322 karma points
    Oct 26, 2018 @ 12:21
    Tom
    0

    "Screen Scrap". Can you share the code?

Please Sign in or register to post replies

Write your reply to:

Draft