errors provider saxparserfactoryimpl transformerfactoryimpl not found

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Nicholas Westby 2054 posts 7104 karma points c-trib

Apr 11, 2020 @ 00:37
0

Errors: Provider SAXParserFactoryImpl / TransformerFactoryImpl not found
I'm trying to modify the code so I can work with PDF's outside of the media folder, so I've copied your code to my own project to try and get it to work.

However, I get an exception during the index process around this line:

I say "around this line" rather than just "this line" because I've actually tried two different approaches. The approaches I tried were:
- Copy the external DLL's from the repo to my bin folder (aside from the ones that were already there, such as those for Examine/UmbracoExamine/Lucene).
- Reference the "TikaOnDotnet.TextExtractor" NuGet package.
With your DLL's, I got this error:

Provider com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl not found

With the NuGet package code, I got this error:

Provider com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl not found

Are there additional steps to installing this that I may be missing (e.g., installing Java if I don't have it, or using a specific version of Java, etc.)?

I'm on Umbraco 7.15.1 in case that is relevant.
Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Apr 11, 2020 @ 00:41

0

Here's one post I came across that seems to imply I may need an older version of Java: https://www.ibm.com/support/pages/javaxxmltransformtransformerfactoryconfigurationerror-provider-orgapachexalanxsltctraxtransformerfactoryimpl-not-found

It mentions something was removed in Java 6 and I seem to have Java 10:

What version of Java has worked for you?

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Apr 11, 2020 @ 09:10

0

Nicholas,

Not sure to be honest, however I recently used this on a V8 site that is on azure webapp and that works fine. Not sure what version of Java.

On Nuget can look at the version of TikaOnDotnet.TextExtractor I used for the original package and does that have dependancy on Java? I don't think you need java installed? I need to check on my webapp see if java is present.

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Apr 11, 2020 @ 09:58

0

Nicholas,

Did a quick check on my webapp its using 1.8.0 and locally i also have same version.

Regards

Ismail

Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Apr 11, 2020 @ 22:19
0
Thanks for getting back to me so quickly.

FYI, I tried installing CogUmbracoExamineMediaIndexer into a fresh install of Umbraco 7.15.1 and it seems to be working as expected, so I must be doing something wrong in my custom code.

I'll dig in and post back here with my findings.

As a side note, just wanted to mention that the package seems to have installed a "MediaSearcher" search provider in ExamineSettings.config that didn't seem to work (caused a 500 error). I imagine this has to do with specific Umbraco versions. I commented out and replaced it with one that did work. Both are shown here for your reference (e.g., in case you want to update the documentation or code or something of that sort):

Here's the code version:
```

<add
  name="MediaSearcher"
  type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
  indexSet="MediaIndexSet"
  analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />
```
(For easier copying/pasting and Google indexing.)
Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Apr 12, 2020 @ 16:32

0

Nicholas,

In your custom code am I right in assuming you are writing your own custom indexer which is reading a bunch of files in a folder? If so then step through and see if the path to the pdf that you are looking to extract is correct.

Also, the indexer I have written is not a custom indexer but an examine indexer which works in the same way the original pdf indexer works.

Regards

Ismail

Copy Link
Nicholas Westby 2054 posts 7104 karma points c-trib

Apr 14, 2020 @ 21:41

100

Turns out I was missing this DLL: IKVM.OpenJDK.XML.Transform.dll

It was referenced from NuGet in my app project, but I guess it didn't get copied over to my web project during the build process. I added the Tika NuGet packages to my web project and problem solved.

I still have a bit more work to do, but at least I'm seeing the text extracted during the indexing phase.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Errors: Provider SAXParserFactoryImpl / TransformerFactoryImpl not found