pdf search also bringing back jpgs

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Damon 217 posts 288 karma points

Jun 16, 2017 @ 14:48

0

PDF Search also bringing back jpgs

Hi,

I set up a search to search within PDFs, which is working:

However, it is also bring back images and pages. How do I restrict it to pdfs? The extensions above is set to pdfs,

Thanks a lot!

Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Jun 16, 2017 @ 15:35
1
Not sure why that's happening. Here are some notes:
1. Your code isn't visible. You have to click the curly braces while highlighting your code in the forum editor to ensure it renders properly as a code sample.
2. You might want to consider trying this instead (it appears to be less costly in commercial usages): https://our.umbraco.org/projects/backoffice-extensions/examinefileindexer/
3. If you want to keep trying UmbracoExamine.PDF, you might want to submit an issue here: https://github.com/umbraco/UmbracoExamine.PDF/issues
4. You might want to try rebuilding the Examine index, or even deleting all of them in App_Data/TEMP/ExamineIndexes and then rebuilding again (just to be sure the data you are seeing isn't from an old indexing operation).
Copy Link
Damon 217 posts 288 karma points

Jun 17, 2017 @ 10:43

0

Thanks I will try.

What do you mean less costly in commercial usages?

Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Jun 18, 2017 @ 07:05

1

What do you mean less costly in commercial usages?

The license section of the readme says this: https://github.com/umbraco/UmbracoExamine.PDF

In case a third party integrates your open source application into a closed source application, he/she will have to procure a commercial license of iText.

An iText license costs several thousand dollars. See here: http://itextpdf.com/Pricing/unit-based

In contrast, ExamineFileIndexer uses Apache Tika, which appears to be free, as it's using the Apache license: https://tika.apache.org/license.html

And the Apache license seems to allow for commercial usage: https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)

Copy Link
Damon 217 posts 288 karma points

Jun 19, 2017 @ 13:24

0

Hi,

I installed ExamineFileIndexer. However, it only seems to be indexing the pdf file names, and not the actual contents of the pdf?

I used the default configuration, described here: https://our.umbraco.org/projects/backoffice-extensions/examinefileindexer/

I also deleted the temp folder.

Do you know how to get it to index the contents of pdfs?

THanks,

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jun 23, 2017 @ 07:53

0

Damon,

Can you log issue on github, also can you take a look at the umbraco log file any errors logged? Also how did you install it package or nuget?

Regards

Ismail

Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Jun 19, 2017 @ 15:03
0
I've never used it before, so I couldn't really tell you what's wrong. I'd start by checking for errors in the Umbraco error log.

Also, you may want to submit a bug report here: https://github.com/thecogworks/examinefileindexer/issues

Some things to try first:
- Rebuild the index in the developer section (the Examine Management dashboard).
- Ensure you have configured both the ExamineIndex.config and the ExamineSettings.config files.
- Ensure you are looking at the correct index name (based on the documentation, that'd be "MediaIndexSet"). If you have search functionality built, it will have to be against this index.
Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

PDF Search also bringing back jpgs