I've a problem with indexing a bunch of html-files which lies in a Virtual Directory in an Umbraco site, which needs to be indexed in Lucene.
The html-pages are only visible through a single IFrame-page. The IFrame doc type has a Url-field which is used to reference the index.html page in the Virtual Directory.
Is it possible to index all these individual pages?
Examine out of the box only indexes your umbraco content using document event handlers, so on publish it will get content and push it into index. If you have some other content you want to push into examine/lucene then you need to write your own indexer. This is pretty straight forward take a look at https://github.com/Shazwazza/Examine/wiki/Indexer and https://github.com/Shazwazza/Examine/blob/master/Projects/Examine.Web.Demo/OrmReaderDataService.cs the second link is code to create indexer to index a database you could modify this to read your html files and put into a new index. You will then need to do multi index search if you want to search umbraco content and the html content.
Index files in Virtual Directory in Lucene
I've a problem with indexing a bunch of html-files which lies in a Virtual Directory in an Umbraco site, which needs to be indexed in Lucene.
The html-pages are only visible through a single IFrame-page. The IFrame doc type has a Url-field which is used to reference the index.html page in the Virtual Directory.
Is it possible to index all these individual pages?
Does any one have a clue?
Simon,
Examine out of the box only indexes your umbraco content using document event handlers, so on publish it will get content and push it into index. If you have some other content you want to push into examine/lucene then you need to write your own indexer. This is pretty straight forward take a look at https://github.com/Shazwazza/Examine/wiki/Indexer and https://github.com/Shazwazza/Examine/blob/master/Projects/Examine.Web.Demo/OrmReaderDataService.cs the second link is code to create indexer to index a database you could modify this to read your html files and put into a new index. You will then need to do multi index search if you want to search umbraco content and the html content.
The other alternative is to implement gatheringnode data event and for the doc types that have iframe get the iframe url and then get content of html and put that in a field then that content will be searchable and it would in the context of the umbraco page. See http://thecogworks.co.uk/blog/posts/2012/november/examiness-hints-and-tips-from-the-trenches-part-2/ for example of gatheringnode data.
Regards
Ismail
Thank you for your input! I will dig into the information you posted.
is working on a reply...