Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Damon 217 posts 287 karma points
    Jun 16, 2017 @ 14:48
    Damon
    0

    PDF Search also bringing back jpgs

    Hi,

    I set up a search to search within PDFs, which is working:

    However, it is also bring back images and pages. How do I restrict it to pdfs? The extensions above is set to pdfs,

    Thanks a lot!

  • Nicholas Westby 2054 posts 7100 karma points c-trib
    Jun 16, 2017 @ 15:35
    Nicholas Westby
    1

    Not sure why that's happening. Here are some notes:

    1. Your code isn't visible. You have to click the curly braces while highlighting your code in the forum editor to ensure it renders properly as a code sample.
    2. You might want to consider trying this instead (it appears to be less costly in commercial usages): https://our.umbraco.org/projects/backoffice-extensions/examinefileindexer/
    3. If you want to keep trying UmbracoExamine.PDF, you might want to submit an issue here: https://github.com/umbraco/UmbracoExamine.PDF/issues
    4. You might want to try rebuilding the Examine index, or even deleting all of them in App_Data/TEMP/ExamineIndexes and then rebuilding again (just to be sure the data you are seeing isn't from an old indexing operation).
  • Damon 217 posts 287 karma points
    Jun 17, 2017 @ 10:43
    Damon
    0

    Thanks I will try.

    What do you mean less costly in commercial usages?

  • Nicholas Westby 2054 posts 7100 karma points c-trib
    Jun 18, 2017 @ 07:05
    Nicholas Westby
    1

    What do you mean less costly in commercial usages?

    The license section of the readme says this: https://github.com/umbraco/UmbracoExamine.PDF

    In case a third party integrates your open source application into a closed source application, he/she will have to procure a commercial license of iText.

    An iText license costs several thousand dollars. See here: http://itextpdf.com/Pricing/unit-based

    In contrast, ExamineFileIndexer uses Apache Tika, which appears to be free, as it's using the Apache license: https://tika.apache.org/license.html

    And the Apache license seems to allow for commercial usage: https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)

  • Damon 217 posts 287 karma points
    Jun 19, 2017 @ 13:24
    Damon
    0

    Hi,

    I installed ExamineFileIndexer. However, it only seems to be indexing the pdf file names, and not the actual contents of the pdf?

    I used the default configuration, described here: https://our.umbraco.org/projects/backoffice-extensions/examinefileindexer/

    I also deleted the temp folder.

    Do you know how to get it to index the contents of pdfs?

    THanks,

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jun 23, 2017 @ 07:53
    Ismail Mayat
    0

    Damon,

    Can you log issue on github, also can you take a look at the umbraco log file any errors logged? Also how did you install it package or nuget?

    Regards

    Ismail

  • Nicholas Westby 2054 posts 7100 karma points c-trib
    Jun 19, 2017 @ 15:03
    Nicholas Westby
    0

    I've never used it before, so I couldn't really tell you what's wrong. I'd start by checking for errors in the Umbraco error log.

    Also, you may want to submit a bug report here: https://github.com/thecogworks/examinefileindexer/issues

    Some things to try first:

    • Rebuild the index in the developer section (the Examine Management dashboard).
    • Ensure you have configured both the ExamineIndex.config and the ExamineSettings.config files.
    • Ensure you are looking at the correct index name (based on the documentation, that'd be "MediaIndexSet"). If you have search functionality built, it will have to be against this index.
Please Sign in or register to post replies

Write your reply to:

Draft