When searching with the ExamineManager in my c# project, I can't get any result for the search term I used. When searching by node id, i get a result but what is strange is that the "FileTextContent" field is not present in the SearchResult.
Is there a special thing to do to have the the FileTextContent field present in my SearchResult ?
Umbraco 4.8.1 Luke : 1.0.1 , 3.5 Examine (and PDF binaries) : 1.4.2
<ExamineLuceneIndexSets> <!-- The internal index set used by Umbraco back-office - DO NOT REMOVE --> <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Internal/"> <IndexAttributeFields> <add Name="id" /> <add Name="nodeName" /> <add Name="updateDate" /> <add Name="writerName" /> <add Name="path" /> <add Name="nodeTypeAlias" /> <add Name="parentID" /> </IndexAttributeFields> <IndexUserFields /> <IncludeNodeTypes /> <ExcludeNodeTypes /> </IndexSet> <!-- The internal index set used by Umbraco back-office for indexing members - DO NOT REMOVE --> <IndexSet SetName="InternalMemberIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/InternalMember/"> <IndexAttributeFields> <add Name="id" /> <add Name="nodeName" /> <add Name="updateDate" /> <add Name="writerName" /> <add Name="loginName" /> <add Name="email" /> <add Name="nodeTypeAlias" /> </IndexAttributeFields> <IndexUserFields /> <IncludeNodeTypes /> <ExcludeNodeTypes /> </IndexSet> <!-- Default Indexset for external searches, this indexes all fields on all types of nodes--> <IndexSet SetName="ExternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/External/" /> <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/PDFIndexSet" /> </ExamineLuceneIndexSets>
I finally found something : If I use the Search method, I get 3 fields : FileTextContent, __IndexType, __NodeId.
var resultsSearch = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].Search("Celtic", true).ToList();
But if I use the searchCriteria, I get only umbraco's node attributes (about 22 attributes) :
var searchCriteria = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].CreateSearchCriteria(UmbracoExamine.IndexTypes.Media); searchCriteria.RawQuery("+FileTextContent:Celtic~"); var resultsRawQuery = ExamineManager.Instance.Search(searchCriteria).ToList();
Is there a tip when I use the searchCriteria to get also the FileTextContent attribute ?
The ~ is used for fuzzy searches (https://lucene.apache.org/core/3_6_0/queryparsersyntax.html).
I've found my error. The code I've used to search was not correct. Here is what I use now and it works.
var provider = (LuceneSearcher)ExamineManager.Instance.SearchProviderCollection["PDFSearcher"]; var criteria = provider.CreateSearchCriteria().RawQuery("+FileTextContent:Celtic~"); var results = provider.Search(criteria);
Could you please assist with examine.pdf configuration for search in the pdf content?
I am using umbraco 4.9 and copied the latest version of umbraco examine pdf from codeplex, placed the dlls in the bin, but got stuck to later configuration for searching with pdf content.
Examine PDF : FileTextContent not present in SearchResult
Hello
I'm trying to implment Umbraco Examine PDF to search in PDF by following this example : http://examine.codeplex.com/wikipage?title=Full%20Configuration%20Markup%20%26%20Options&referringTitle=UmbracoExamine.
When I look the index with Luke, i've got in the index the "FileTextContent" field who contains the pdf's content.
When searching with the ExamineManager in my c# project, I can't get any result for the search term I used. When searching by node id, i get a result but what is strange is that the "FileTextContent" field is not present in the SearchResult.
Is there a special thing to do to have the the FileTextContent field present in my SearchResult ?
Umbraco 4.8.1
Luke : 1.0.1 , 3.5
Examine (and PDF binaries) : 1.4.2
Fino
Fino,
Can you paste your examine config files please. Just interested in the pdf indexer bits.
Regards
Ismail
Hello Ismail
Thanks for your help. Here is my 2 Examine files :
ExamineSettings.config :
ExamineIndex.config :
I finally found something :
If I use the Search method, I get 3 fields : FileTextContent, __IndexType, __NodeId.
But if I use the searchCriteria, I get only umbraco's node attributes (about 22 attributes) :
Is there a tip when I use the searchCriteria to get also the FileTextContent attribute ?
Fino
fino,
when using searchCriteria what you do you get if you dont use raw query? What does ~ do in lucene query?
Regards
Ismail
Hello Ismail
The ~ is used for fuzzy searches (https://lucene.apache.org/core/3_6_0/queryparsersyntax.html).
I've found my error. The code I've used to search was not correct.
Here is what I use now and it works.
Thanks for your help
fino
Hi Ismail,
Could you please assist with examine.pdf configuration for search in the pdf content?
I am using umbraco 4.9 and copied the latest version of umbraco examine pdf from codeplex, placed the dlls in the bin, but got stuck to later configuration for searching with pdf content.
Thanks,
David
is working on a reply...