So the problem we’ve noticed is that sometimes some of the frontend servers are not in-sync with the others.
This happens very random but generally like once a week. What usually happens is that the client tells us that his changes is not visible on the web, so we browse every single frontend server directly using it’s IP and we see that 1 (sometimes 2) of them are not in sync with the others. This usually goes away by logging in to the back office and re-publish the entire site.
Every time this has happened, a error is logged. This error:
DISTRIBUTED CACHE IS NOT UPDATED. Failed to execute instructions (id: 20828, instruction count: 2). Instruction is being skipped/ignored
System.Exception: Indexing Error Occurred: Cannot index queue items, the index is currently locked
at Examine.LuceneEngine.Providers.LuceneIndexer.OnIndexingError(IndexingErrorEventArgs e) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 569
at Examine.LuceneEngine.Providers.LuceneIndexer.ForceProcessQueueItems(Boolean block) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1497
at Examine.LuceneEngine.Providers.LuceneIndexer.StartIndexing() in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1442
at Examine.LuceneEngine.Providers.LuceneIndexer.SafelyProcessQueueItems() in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1421
at Examine.LuceneEngine.Providers.LuceneIndexer.AddNodesToIndex(IEnumerable`1 nodes, String type) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 867
at UmbracoExamine.BaseUmbracoIndexer.ReIndexNode(XElement node, String type)
at Examine.ExamineManager._ReIndexNode(XElement node, String type, IEnumerable`1 providers) in X:\Projects\Examine\Examine\src\Examine\ExamineManager.cs:line 197
at Umbraco.Web.Search.ExamineEvents.ReIndexForContent(IContent sender, Boolean isContentPublished)
at Umbraco.Core.Events.TypedEventHandler`2.Invoke(TSender sender, TEventArgs e)
at Umbraco.Core.Cache.CacheRefresherBase`1.OnCacheUpdated(TInstanceType sender, CacheRefresherEventArgs args)
at Umbraco.Core.Sync.DatabaseServerMessenger.RefreshByIds(Guid uniqueIdentifier, String jsonIds)
at Umbraco.Core.Sync.DatabaseServerMessenger.NotifyRefreshers(IEnumerable`1 instructions, HashSet`1 processed)
at Umbraco.Core.Sync.DatabaseServerMessenger.ProcessDatabaseInstructions(IReadOnlyCollection`1 instructionBatch, CacheInstructionDto dto, HashSet`1 processed, Int32& lastId)
Also we get a error with the logger LocalTempStorageIndexer:
Could not create index writer with snapshot policy for copying, the index cannot be used
And a whole bunch of these errors:
Provider=InternalIndexer, NodeId=-1
System.Exception: Cannot index queue items, the index is currently locked,, IndexSet: InternalIndexSet
All the errors seems to be related to something being locked, but I’m not sure where to start looking.
They have made some load balanced bug fixes and index improvements in 7.6 so that's definitely worth a look.
In 7.6 there is a healthcheck method on each of the indexes to determine if they are in sync.
What I did: I wrote a healthcheck end point method that auto checks the health of these indexes (this endpoint is hit on every server every 30s). If one is found to be unhealthy you can either replace the server or call the method to auto rebuild the index.
Hmm, might be worth be upgrading. Always a risk, this is a pretty huge project and hard to know (as always) if an upgrade would take 1 hour or one day (or more) and after that I'm not sure that this issue would actually be fixed.
Your health-check fix sounds cool, however it doesn't actually fix the problem, it just checks if the indexes is out of sync, and then a manual action is needed? This is cool and useful, but I want to find the source to the problem.
<?xml version="1.0"?>
<!--
Umbraco examine is an extensible indexer and search engine.
This configuration file can be extended to add your own search/index providers.
Index sets can be defined in the ExamineIndex.config if you're using the standard provider model.
More information and documentation can be found on GitHub: https://github.com/Shazwazza/Examine/
-->
<Examine RebuildOnAppStart="true">
<ExamineIndexProviders>
<providers>
<add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" useTempStorage="Sync"/>
<add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" useTempStorage="Sync"/>
<!-- default external indexer, which excludes protected and unpublished pages-->
<add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" useTempStorage="Sync"/>
<add name="MainIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" indexSet="MainIndexSet" runAsync="false" interval="600" useTempStorage="Sync"/>
</providers>
</ExamineIndexProviders>
<ExamineSearchProviders defaultProvider="ExternalSearcher">
<providers>
<add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" useTempStorage="Sync"/>
<add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" useTempStorage="Sync"/>
<add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true" useTempStorage="Sync"/>
<add name="MainSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MainIndexSet" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcard="true" useTempStorage="Sync"/>
</providers>
</ExamineSearchProviders>
</Examine>
About the sync, I´m not sure actually. I haven't built this project, just maintaining it. Sounds like something I need to contact the server admins and ask.
Yes, my guess would be that something is syncing although you are using useTempStorage="Sync" so it should be ok as the ASP.NET temp location shouldn't be being synced.
This new option moves the indexes to the local system temp folder (and only keeps a single copy of the indexes) which shouldn't be subject to any syncing so avoiding locking issues.
First off, if you are NOT using Azure (I'm using AWS) you can turn off the sync rubbish, its an azure "fix" (hack).
i.e. remove this setting completely, you won't need it: useTempStorage="Sync"
With my health check there is no manual action needed, the indexes can be refreshed with an automated api call (or pull the box down and AWS will auto deploy a fresh one and it will build a fresh index).
Umbraco have had lots and lots of these index/publishing issues, some have been fixed in code, others have been fixed by installing optional windows patches onto the servers.
Phil, regardless of your infrastructure choice, if the file system is being replicated you must exclude Examine indexes and umbraco.config from the replication, useTempStorage="Sync" is one method, albeit not the current recommend method of doing this.
If the file system is not replicated then this isn't an issue (I imagine that's your setup).
Your dashboard/endpoint solution sounds interesting, is the code on GitHub somewhere?
Ok, we made a small dashboard for requesting specific indexes be rebuilt on specific servers also although we don't auto monitor status which is a nice idea, the dashboard is available at https://github.com/CrumpledDog/ExamineDistributedDashboard
Have you managed to fix this issue? If yes, can you please share it how did you do this?
We have the same issue on our Production website, Umbraco 7.12.3. All solutions mentioned here have already applied on our websites.
Hi Ruslan.
Actually no, this problem actually went away by itself and we haven’t had any issues since then. We have however upgraded Umbraco several times so I’m not sure it this is something that went away in a upgrade. Maybe you could try to upgrade to 7.13.x and see if it solves the problem?
(DISTRIBUTED CACHE IS NOT UPDATED) error in Load Balanced Environment
Hello community.
I´m having some issue with a load-balanced environment and I need your help/guidance.
Info about site:
So the problem we’ve noticed is that sometimes some of the frontend servers are not in-sync with the others. This happens very random but generally like once a week. What usually happens is that the client tells us that his changes is not visible on the web, so we browse every single frontend server directly using it’s IP and we see that 1 (sometimes 2) of them are not in sync with the others. This usually goes away by logging in to the back office and re-publish the entire site.
Every time this has happened, a error is logged. This error:
Also we get a error with the logger LocalTempStorageIndexer:
And a whole bunch of these errors:
All the errors seems to be related to something being locked, but I’m not sure where to start looking.
Has anyone else experienced this? Any tips?
Much appreciated.
I'm assuming you are using Azure?
They have made some load balanced bug fixes and index improvements in 7.6 so that's definitely worth a look.
In 7.6 there is a healthcheck method on each of the indexes to determine if they are in sync. What I did: I wrote a healthcheck end point method that auto checks the health of these indexes (this endpoint is hit on every server every 30s). If one is found to be unhealthy you can either replace the server or call the method to auto rebuild the index.
Hi Phil.
No it's not Azure.
Hmm, might be worth be upgrading. Always a risk, this is a pretty huge project and hard to know (as always) if an upgrade would take 1 hour or one day (or more) and after that I'm not sure that this issue would actually be fixed.
Your health-check fix sounds cool, however it doesn't actually fix the problem, it just checks if the indexes is out of sync, and then a manual action is needed? This is cool and useful, but I want to find the source to the problem.
Thanks!
Hi Dennis,
How does your ExamineSettings.config currently look?
Jeavon
Here is my current ExamineSettings.config:
Ok cool and do you have the tokenised paths in the ExamineIndex.config?
Do your front end servers sync their file system and if so do they sync the TEMP folder?
Hi again Jeavon.
Here is my ExamineIndex.config:
About the sync, I´m not sure actually. I haven't built this project, just maintaining it. Sounds like something I need to contact the server admins and ask.
Thank you for helping!!
Hi Dennis,
Yes, my guess would be that something is syncing although you are using
useTempStorage="Sync"
so it should be ok as the ASP.NET temp location shouldn't be being synced.Anyhow I would recommend you consider updating to the latest Examine v0.1.83 and use the new DirectoryFactory instead (this means you can also remove the tokenised paths) https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/flexible#examine-v0-1-83
This new option moves the indexes to the local system temp folder (and only keeps a single copy of the indexes) which shouldn't be subject to any syncing so avoiding locking issues.
Jeavon
First off, if you are NOT using Azure (I'm using AWS) you can turn off the sync rubbish, its an azure "fix" (hack). i.e. remove this setting completely, you won't need it: useTempStorage="Sync"
With my health check there is no manual action needed, the indexes can be refreshed with an automated api call (or pull the box down and AWS will auto deploy a fresh one and it will build a fresh index).
Umbraco have had lots and lots of these index/publishing issues, some have been fixed in code, others have been fixed by installing optional windows patches onto the servers.
So I wont need useTempStorage="Sync"? Just remove it or do I provide it with some other value?
Ah, I see. Sorry I misunderstood your helthcheck.
Phil, regardless of your infrastructure choice, if the file system is being replicated you must exclude Examine indexes and umbraco.config from the replication,
useTempStorage="Sync"
is one method, albeit not the current recommend method of doing this.If the file system is not replicated then this isn't an issue (I imagine that's your setup).
Your dashboard/endpoint solution sounds interesting, is the code on GitHub somewhere?
Jeavon: I'm reading up on this page: https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/files-replicated.
So the TEMP folder should not be synced between enviroments? Just so I know what to ask the server admins..
Alternaly: Would adding this to my web.config help?
Possibly, if your TEMP folder is being synced then you would need to set that as otherwise the umbraco.config will be synced between servers.
I think you should find out from the server admins if there is any syncing going on between any of the file systems or if there is none.
Although this won't be causing the exception you posted as that relates to Examine.
you are correct, my file system is not shared. each server instance uses its own local file system so I don't need the sync.
the index "dashboard" is under the developer tab in v7.6
I just consumed the same backoffice api endpoints that uses.
Ok, we made a small dashboard for requesting specific indexes be rebuilt on specific servers also although we don't auto monitor status which is a nice idea, the dashboard is available at https://github.com/CrumpledDog/ExamineDistributedDashboard
AWS hits each server for a healthcheck. I just built a custom endpoint that does some basic checks and throws error logs for monitoring purposes.
Hi, Dennis
Have you managed to fix this issue? If yes, can you please share it how did you do this? We have the same issue on our Production website, Umbraco 7.12.3. All solutions mentioned here have already applied on our websites.
Thank you in advance!
Hi Ruslan. Actually no, this problem actually went away by itself and we haven’t had any issues since then. We have however upgraded Umbraco several times so I’m not sure it this is something that went away in a upgrade. Maybe you could try to upgrade to 7.13.x and see if it solves the problem?
is working on a reply...