Press Ctrl / CMD + C to copy this to your clipboard.
This post will be reported to the moderators as potential spam to be looked at
I have an Umbraco site that stores data from a 3rd party service in an Examine index. I want the site to update the indexes every hour or so, however, I'm running into the issue that every time we update the indexes, there's a period where the indexes are either empty, or only partially populated, causing incomplete data.
Does anyone know of a way where we can update the indexes, without affecting the site? For example, updating the indexes to a duplicate set of indexes every hour and then overwriting the indexes that the searches use once that's done? I'm trying to find the best way to approach this and was wondering if anyone has run into anything similar?
I'm interested to know the answer to this too.
Is it the entire index that needs to be recreated or do you know which items should be updated?
First need to know what umbraco + examine version you are using?
So to verify, what you are trying to acheive is that you want to update some of the index with new data from this 3rd party service on an interval?
Need to know how this is being caused "every time we update the indexes, there's a period where the indexes are either empty, or only partially populated, causing incomplete data." ? How are you updating the indexes currently? what is the 'period of time' ? Does this index contain a ton of data ?
HI @Shannon Deminick,
I've just searched on a similar subject and found your post here, I would be interested if it is doable whether or not to update some indexes based on a 3rd party API calls, and don't know yet if could update just the affected items if not we would need to update about 20K items(basically we would need to rebuild the whole index collection on this type of items), once per day, I would like also to mention that I'm using the latest version of 0.1.89.
Can you please suggest me a solution for this case?
Thanks in advance,
I guess you have built the custom indexer using simpledata indexer? If so then that means you have to do periodic rebuilds you got 2 options:
Thanks for the replies guys!
@Shannon, the site is currently running 4.11.4, but we can upgrade if need be, as the site isn't live yet. We're basically indexing data from a completely separate 3rd party web service that spits out XML. We're using a SimpleDataIndexer, as per the example in the Examine source, except, we're connecting to the service, getting the XML, and converting it to a simple data set before running a full index rebuild. Currently the process runs hourly, it's a windows scheduled task that calls a /base URL that calls the RebuildIndex method on the whole index. If you search when the reindexing is running, you get either no data, or some data, as the indexing process is taking place. Currently there are around a 1000 items in the index, but there will likely be a lot more over time.
@Ismail, sadly the service is a separate web service with no events we can plug into to update each item individually. I guess I could alter the re-indexing code to compare the edit dates on the web service data, and update only if there's been a change using the ReIndexNode method? I would also need to add the items if there's an addition, and remove them if there's items in the index that is no longer returned by the web service. That would slow down the indexing I guess, but if it solves the issue of all the data disappearing during the update, then I can live with the slower indexing!
@Anthony, at the moment we're just bulk updating everything, but as I mentioned earlier, I could recode the logic to update records individually if need be, it'll just mean changing the code slightly.......
@Tim... you should avoid doing an index rebuild at all costs... this should be a one time only thing and then you should just keep your index updated by updating the data that requires updating. Otherwise rebuilding your index will delete the index and then start from scratch based on your data source. If you update each item based on it's ID, then the index will always be there and Examine will just ensure that it is updated and this will not affect your live site (apart from additional CPU usage required, but it should be similar to rebuilding the index, if not faster.. ) What I should do in the core of Examine though is if a Rebuild index action is executed and the index exists, then I should rebuild an offline index and then overwrite the live one... though that isn't created right now and it would still be better to just keep your index in sync rather than rebuild.
The records from the web service do they have unique key and last update date? If so then shove those in the index. Next your polling app get it to get last record id then check if you have from web service ids greater than that those are your new records to insert. For others you could use update date to figure out updated stuff and then delete and re insert those.
Thanks guys, the records do indeed have unique ids, and we've got them to add updated date, so I'm going to add a new method to do an update just to things that have changed that'll run hourly, and we'll do a full rebuild maybe once every two weeks. Thanks for all the advice, it's greatly apreciated :)
I've added a task for this too:
Have this working now, and it works much better! Thanks again guys, tap me up for a beer at Codegarden :)
Just wanna high five you guys, this post saved my custom index :)
Comment author was deleted
still relevant :) thanks, will try to wrap this into a blog post on how to index custom data and keep it up to date!
is working on a reply...
Write your reply to:
Image will be uploaded when post is submitted