The indexer screen-scrapes the data from our original jobs board website and creates a SimpleDataSet for each job, so it's not based on Umbraco content. To keep it up to date, a scheduled task runs every two hours to call a web API and kicks off a reindex.
This is running on Azure Web Apps, and works fine when on one server. However, when it scales out the scheduled task (an Azure Web Job) calls the web API on just one of those servers, and the Umbraco logs record the Examine index being updated on just one server. When I checked the indexes on multiple servers this morning the content was different, and I think that's because only one index is getting updated with each web API call.
Am I correct in thinking that calling the Examine methods BaseIndexProvider.DeleteFromIndex and BaseIndexProvider.ReIndexNode would only update the index for the current machine, rather than all indexes in an Umbraco flexible load-balancing setup?
The webjob is calling the api end point in master in your load balanced setup? If so then in the umbracosettings.config you should have list of servers in your cluster. update the end point so that it will get the list of servers from the umbracosettings config and it then calls rebuild index on all the servers in the cluster.
This is flexible load balancing, not legacy, so there's no list of servers in umbracoSettings.config. On Azure Web Apps the servers can change dynamically.
Good point about master though. I've just re-read the docs and maybe I need to designate one server explicitly as master and others as slaves rather than allowing Umbraco to work it out, then I could guarantee master was called.
With flexible load-balancing servers register themselves in the database when Umbraco boots, so the list is in the Umbraco database, not in umbracoSettings.config.
On further investigation though, I think I may have been on the wrong track. My index should have 107 items. Every instance from dev to live back office it has 107, except on our public site where it has 87 and that doesn't change when I reindex. It doesn't even change when I delete the index under App_Data/TEMP/ExamineIndexes.
The recommended settings for Azure store another copy in a local temp folder, and my guess is that one is in use and is corrupt somehow, preventing updates.
I will have to keep an eye on the indexes to see whether there is actually a problem with sites updating separately, or whether Umbraco's load balancing just handles it. I suspect it'll be working fine.
The answer to this proved to be that indexes based on SimpleDataIndexer are not maintained across load-balanced servers in the same way as indexes of Umbraco content are.
The workaround for this is to build the index on a single non-load-balanced server, and write a web api on the non-load-balanced servers for the load-balanced servers to call and read the data.
This ties in with what i mention on the course that for single crud you have handle this as there are no events like you have with umbraco content. Its same for load balance the events that fire to keep things in sync with umbraco content are not present with simple indexer.
How to keep load-balanced Examine up-to-date?
Our jobs site is based on Examine.
The indexer screen-scrapes the data from our original jobs board website and creates a
SimpleDataSet
for each job, so it's not based on Umbraco content. To keep it up to date, a scheduled task runs every two hours to call a web API and kicks off a reindex.This is running on Azure Web Apps, and works fine when on one server. However, when it scales out the scheduled task (an Azure Web Job) calls the web API on just one of those servers, and the Umbraco logs record the Examine index being updated on just one server. When I checked the indexes on multiple servers this morning the content was different, and I think that's because only one index is getting updated with each web API call.
Am I correct in thinking that calling the Examine methods
BaseIndexProvider.DeleteFromIndex
andBaseIndexProvider.ReIndexNode
would only update the index for the current machine, rather than all indexes in an Umbraco flexible load-balancing setup?Rick,
The webjob is calling the api end point in master in your load balanced setup? If so then in the umbracosettings.config you should have list of servers in your cluster. update the end point so that it will get the list of servers from the umbracosettings config and it then calls rebuild index on all the servers in the cluster.
Regards
Ismail
This is flexible load balancing, not legacy, so there's no list of servers in umbracoSettings.config. On Azure Web Apps the servers can change dynamically.
Good point about master though. I've just re-read the docs and maybe I need to designate one server explicitly as master and others as slaves rather than allowing Umbraco to work it out, then I could guarantee master was called.
I am confused? Even with flexible load balancing u need to add new server to the cluster and then umbraco needs to know about it?
With flexible load-balancing servers register themselves in the database when Umbraco boots, so the list is in the Umbraco database, not in umbracoSettings.config.
On further investigation though, I think I may have been on the wrong track. My index should have 107 items. Every instance from dev to live back office it has 107, except on our public site where it has 87 and that doesn't change when I reindex. It doesn't even change when I delete the index under App_Data/TEMP/ExamineIndexes.
The recommended settings for Azure store another copy in a local temp folder, and my guess is that one is in use and is corrupt somehow, preventing updates.
iisreset has fixed the corrupt index.
I will have to keep an eye on the indexes to see whether there is actually a problem with sites updating separately, or whether Umbraco's load balancing just handles it. I suspect it'll be working fine.
The answer to this proved to be that indexes based on SimpleDataIndexer are not maintained across load-balanced servers in the same way as indexes of Umbraco content are.
The workaround for this is to build the index on a single non-load-balanced server, and write a web api on the non-load-balanced servers for the load-balanced servers to call and read the data.
See the comments on http://issues.umbraco.org/issue/U4-5993 for the difference between the Umbraco indexes and SimpleDataIndexer indexes.
This ties in with what i mention on the course that for single crud you have handle this as there are no events like you have with umbraco content. Its same for load balance the events that fire to keep things in sync with umbraco content are not present with simple indexer.
is working on a reply...