Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.
Parser Error Message: Could not load type 'UmbracoExamine.LuceneExamineSearcher' from assembly 'UmbracoExamine'.
Source Error:
Line 27: <!-- This search provider is used to search the PDFIndexSet. -->
Line 28: <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true" />
Line 29: <add name="MediaSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine" indexSet="MediaIndexSet" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" /> Line 30: </providers>
Line 31: </ExamineSearchProviders>
Line 29 in the ExamineSettings.config can you update
UmbracoExamine.LuceneExamineSearcher
To
UmbracoExamine.UmbracoExamineSearcher
Can you also ensure that you have downloaded the tika dll which is not part of the package and ensure that it is in the bin directory. If that does not work report back I have a new compiled dll that is built with v7 and it defintately works.
I did notice that difference in the ExamineSettings.config and made that change. I've also added the tika-app-1.2.dll to my bin folder. I've deleted and rebuilt the index and via the back office Examine Management and I can see that it is working. However, the PDF documents that are Uploaded via content nodes are not in the index. On my test site I have 3 PDF files in the Media section and 3 PDF files that are Uploaded to content nodes. Via Windows Exploerer I can see 6 folders under the Media folder.
For this client all of the PDF files are Uploaded via their content nodes, so I really need to find a way to get those indexed. Is there some other configuration change that I am missing?
The indexer will only work with whatever has been uploaded to the media section. For media uploaded via content section you will need to do a number of things:
Implement gatheringnode data event on the ExternalIndex (has to be on external as this deals with content not media) in this method test if the current item being indexed is of type content with file upload, if it is then test if you have a file upload. If you have a file then new up the following class found in CogUmbracoExamineMediaIndexer called TextExtractor so
var textExtractor = new TextExtractor();
try
{
var textExtractionResult = textExtractor.Extract("replace with file path of your pdf from upload field");
var pdfText = textExtractionResult.Text;
e.Fields.Add("pdfContent",pdfText);
}
catch (Exception ex)
{
//log error
}
I have not tested above code but this roughly how you would do it, you are basically getting the contents of the upload pdf and injecting the content after extraction into your main index so then its searchable.
One more thing make sure the path you pass in of pdf is full path not relative, so in your event code you will get from e.Fields a relative path you will need to do Server.MapPath and pass that in.
Umbraco 7 Install Error
Has anyone tried to use this package with Umbraco 7? I just tried to install it and get the following error:
Adding the following to the httpRuntime in web.config solved the install error.
maxRequestLength="1048576" executionTimeout="3600"
Which resulted in the following error.
Configuration Error
Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.
Parser Error Message: Could not load type 'UmbracoExamine.LuceneExamineSearcher' from assembly 'UmbracoExamine'.
Source Error:
Line 27: <!-- This search provider is used to search the PDFIndexSet. --> Line 28: <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true" /> Line 29: <add name="MediaSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine" indexSet="MediaIndexSet" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" /> Line 30: </providers> Line 31: </ExamineSearchProviders>
Source File: C:\inetpub\wwwroot\CentrifyMVCControllers\config\ExamineSettings.config Line: 29
Janet,
Line 29 in the ExamineSettings.config can you update
UmbracoExamine.LuceneExamineSearcher
To
UmbracoExamine.UmbracoExamineSearcher
Can you also ensure that you have downloaded the tika dll which is not part of the package and ensure that it is in the bin directory. If that does not work report back I have a new compiled dll that is built with v7 and it defintately works.
Regards
Ismial
Thank you Ismail for the quick response.
I did notice that difference in the ExamineSettings.config and made that change. I've also added the tika-app-1.2.dll to my bin folder. I've deleted and rebuilt the index and via the back office Examine Management and I can see that it is working. However, the PDF documents that are Uploaded via content nodes are not in the index. On my test site I have 3 PDF files in the Media section and 3 PDF files that are Uploaded to content nodes. Via Windows Exploerer I can see 6 folders under the Media folder.
For this client all of the PDF files are Uploaded via their content nodes, so I really need to find a way to get those indexed. Is there some other configuration change that I am missing?
Janet,
The indexer will only work with whatever has been uploaded to the media section. For media uploaded via content section you will need to do a number of things:
Implement gatheringnode data event on the ExternalIndex (has to be on external as this deals with content not media) in this method test if the current item being indexed is of type content with file upload, if it is then test if you have a file upload. If you have a file then new up the following class found in CogUmbracoExamineMediaIndexer called TextExtractor so
I have not tested above code but this roughly how you would do it, you are basically getting the contents of the upload pdf and injecting the content after extraction into your main index so then its searchable.
Regards
Ismail
Janet,
One more thing make sure the path you pass in of pdf is full path not relative, so in your event code you will get from e.Fields a relative path you will need to do Server.MapPath and pass that in.
Regards
Ismail
Janet, was this ever resolved for you in the ability of searching for .PDFs in both situations with the given setup?
is working on a reply...