We are using custom merged fields within the external Examine index to search against. It's pretty standard stuff triggered on the GatheringNodeData event.
BaseIndexProvider baseIndexProvider = ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName];
if (baseIndexProvider != null)
{
baseIndexProvider.GatheringNodeData += (sender, e) => this.GatheringNodeData(sender, e, helper);
}
This is all standard stuff and works.... To a point.
Since there are multiple websites there are multiple Examine indexes. Master will always stay correctly synchronised since we use the back office in that instance to publish content. Slave will go out of sync resulting in the following messages logged.
2015-11-28 11:10:34,167 [P8776/D2/T94] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 1449 from Examine index, reverting to looking up media via legacy library.GetMedia method
2015-11-28 11:10:34,292 [P8776/D2/T94] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2065 from Examine index, reverting to looking up media via legacy library.GetMedia method
2015-11-28 11:10:34,292 [P8776/D2/T94] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2067 from Examine index, reverting to looking up media via legacy library.GetMedia method
2015-11-28 11:10:34,308 [P8776/D2/T94] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2069 from Examine index, reverting to looking up media via legacy library.GetMedia method
This is happening a lot...
What I would like to know is what would be the correct way to synchronise the two indexes whenever Master changes the content or media tree.
I know about the PageCacheRefresher.CacheUpdated event. Is that the best place to do this? We are already invalidating our donut cache via an eventhandler and I've thrown in a little Examine code to test.
private void PageCacheRefresherCacheUpdated(PageCacheRefresher sender, CacheRefresherEventArgs e)
{
MessageType kind = e.MessageType;
if (kind == MessageType.RefreshById || kind == MessageType.RemoveById)
{
// Attempt to remove cache by document type alias and template alias.
int? id = e.MessageObject as int?;
if (id.HasValue)
{
IContentService contentService = ApplicationContext.Current.Services.ContentService;
IContent entity = contentService.GetById(id.Value);
this.ClearOutputCacheItem(this, entity);
if (kind == MessageType.RemoveById)
{
// TODO: Test the crap out of this
ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].DeleteFromIndex(
id.ToString());
}
else
{
// TODO: How do I reindex one item?
ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].RebuildIndex();
}
}
}
else if (kind == MessageType.RefreshAll)
{
// Remove all caches.
this.OutputCacheManager.RemoveItems();
ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].RebuildIndex();
}
}
You'll see in my comments that I'm a little unsure that I'm doing the correct thing and I'm really not happy having to rebuild the entire index when I have the id to the node in question. I couldn't figure out a way to clear one item with the objects present in this event.
Am I on the correct track? If so, could you post a code example indicating the best way to perform synchronisation.
Ok so it seems I might be able to reindex a single node by converting the IContent using the PackagingService. However I cannot find any documentation on that service nor clarity on what the type parameter is.
The below is untested. Is this the correct way to do it?
Publishing on master should cause publishing on slaves and triggering of all related events including indexing so in theory should not need to do this manually? Unless it does not work?
With regards to reindex Matt had to reindex node recently and got the node xml from cache thereby saving db hit. He is digging out some code
I think my scenario was slightly different, as I have a parent / child document setup where when the child node indexes itself, it grabs content from the parent node to index along with it. So what follows is how I force the childnodes to reindex when the parent node gets updated in the cache. Not sure how much it will help, but it might:
CacheRefresherBase<PageCacheRefresher>.CacheUpdated += (sender, args) =>
{
IPublishedContent publishedContent = null;
if (args.MessageType == MessageType.RefreshById)
{
publishedContent = UmbracoContext.Current.ContentCache.GetById((int)args.MessageObject);
}
else if (args.MessageType == MessageType.RefreshByInstance)
{
publishedContent = UmbracoContext.Current.ContentCache.GetById(((IContent)args.MessageObject).Id);
}
if (publishedContent != null && publishedContent.DocumentTypeAlias == Constants.DocTypeAliases.MyDocTypeAlias)
{
// Reindex child nodes
foreach (var child in publishedContent.Children)
{
var xmlStr = umbraco.library.GetXmlNodeById(child.Id.ToString()).Current.OuterXml;
var xml = XElement.Parse(xmlStr);
ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"]
.ReIndexNode(xml, IndexTypes.Content);
}
}
};
We are experiencing similar issues. I think you are also using the new flexible loadbalancing introduced in 7.3.
How this works is that when you make a change to a content or a media item a cacheinstruction is written to the database (table umbracoCacheInstruction).
On the first request to your slave app it will look in a file in the folder App_Data/TEMP/DistCache/{machineName}/*-lastsynced.txt. This file contains id of the last cache instruction that has run on your slave server. It will then load all cache instructions from the db with a later id then in the file.
So if you change something on master these changes aren't immediatly executed on the slave instances. They should receive a request first.
As Examine indexes are updated async, so it's possible that request to your page is executed before the index is updated and hence giving you out of date content.
And as I see you are also using outputcache your page will be cached and will not reflect the changes until the next time the cache is refreshed.
We started noticing this when editors were starting to change the focal point on images. The change was immediatly visible on the master server but not on the slaves. To make it visible on the slave servers we need another content change action in the backend (for clearing output cache).
Synchronising Examine indexes.
Hi all,
I have the following webapp setup on Azure.
We are using custom merged fields within the external Examine index to search against. It's pretty standard stuff triggered on the
GatheringNodeData
event.This is all standard stuff and works.... To a point.
Since there are multiple websites there are multiple Examine indexes. Master will always stay correctly synchronised since we use the back office in that instance to publish content. Slave will go out of sync resulting in the following messages logged.
This is happening a lot...
What I would like to know is what would be the correct way to synchronise the two indexes whenever Master changes the content or media tree.
I know about the
PageCacheRefresher.CacheUpdated
event. Is that the best place to do this? We are already invalidating our donut cache via an eventhandler and I've thrown in a little Examine code to test.You'll see in my comments that I'm a little unsure that I'm doing the correct thing and I'm really not happy having to rebuild the entire index when I have the id to the node in question. I couldn't figure out a way to clear one item with the objects present in this event.
Am I on the correct track? If so, could you post a code example indicating the best way to perform synchronisation.
Cheers!
Ok so it seems I might be able to reindex a single node by converting the
IContent
using thePackagingService
. However I cannot find any documentation on that service nor clarity on what the type parameter is.The below is untested. Is this the correct way to do it?
I pass the package service as follows...
James,
Publishing on master should cause publishing on slaves and triggering of all related events including indexing so in theory should not need to do this manually? Unless it does not work?
With regards to reindex Matt had to reindex node recently and got the node xml from cache thereby saving db hit. He is digging out some code
I think my scenario was slightly different, as I have a parent / child document setup where when the child node indexes itself, it grabs content from the parent node to index along with it. So what follows is how I force the childnodes to reindex when the parent node gets updated in the cache. Not sure how much it will help, but it might:
Thanks Matt.
This is really helpful as I have the same scenario as you and was stuck as to how to get the child nodes to refresh in the index.
Ver
Hi James,
We are experiencing similar issues. I think you are also using the new flexible loadbalancing introduced in 7.3.
How this works is that when you make a change to a content or a media item a cacheinstruction is written to the database (table umbracoCacheInstruction).
On the first request to your slave app it will look in a file in the folder App_Data/TEMP/DistCache/{machineName}/*-lastsynced.txt. This file contains id of the last cache instruction that has run on your slave server. It will then load all cache instructions from the db with a later id then in the file.
So if you change something on master these changes aren't immediatly executed on the slave instances. They should receive a request first.
As Examine indexes are updated async, so it's possible that request to your page is executed before the index is updated and hence giving you out of date content.
And as I see you are also using outputcache your page will be cached and will not reflect the changes until the next time the cache is refreshed.
We started noticing this when editors were starting to change the focal point on images. The change was immediatly visible on the master server but not on the slaves. To make it visible on the slave servers we need another content change action in the backend (for clearing output cache).
Still looking for a way on how to solve this.
Dave
is working on a reply...