We are using Umbraco 7.3.4 and parts of the site use Examine searches extensively. Content updates to the site are occasional so publishes etc are not that frequent. On publish we do tap into examine events to inject fields into the index.
Since update to Umbraco 7.3.4 we are getting issues with indexes corrupting.
There is nothing in Umbraco logs also we are using elmah.io and there is nothing in elmah. Site is sitting on azure and we have looked at eventlog also cannot see anything there. We thought initially it may be something to do with app pool restarting and index rebuilding. So we updated the ExamineSettings config RebuildOnAppStart and set that to false. This fixed the issue for about 2 weeks however recently it happened again.
We are having to rebuild indexes and in some instances restarting app pool then rebuilding indexes to get the search powered functionality back up.
We've had issues with indexes locking which then get out of sync / corrupted requiring app pool to be stopped to free up the lock and reindex. Applying the hot fix mentioned in that issue seems to have resolved it for us so far.
Not that I can see well take another peek though. Also we will upgrade to 7.3.7 ASAP. Hopefully that my fix it, Jeavon's investigations look very promising, also his coding skills there are a good reason for his back to back MVP's the guy is a legend!!
Still getting this issue at least once every 2 weeks. Anyone seen this before?
I did see this in the umbracolog:
ERROR Umbraco.Core.UmbracoApplicationBase - An unhandled exception occurred
System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\RD000D3A203E7D\External\Index\segments.gen' is denied.
at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize)
at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
at Lucene.Net.Index.SegmentInfos.Read(Directory directory)
at Lucene.Net.Index.SegmentInfos.ReadCurrentVersion(Directory directory)
at Lucene.Net.Index.DirectoryReader.IsCurrent()
at Lucene.Net.Index.DirectoryReader.DoReopenNoWriter(Boolean openReadOnly, IndexCommit commit)
at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 359
at Examine.LuceneEngine.Providers.LuceneSearcher.GetSearcher() in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 239
at Examine.LuceneEngine.Providers.BaseLuceneSearcher.Search(ISearchCriteria searchParams, Int32 maxResults) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\BaseLuceneSearcher.cs:line 175
at Application.BusinessLogic.Services.SearchService.SearchTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\SearchService.cs:line 252
at Application.BusinessLogic.Services.TalentDetailService.GetTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\TalentDetailService.cs:line 109
at Application.Web.ContentFinders.TalentContentFinder.TryFindContent(PublishedContentRequest request) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.Web\ContentFinders\TalentContentFinder.cs:line 33
at System.Linq.Enumerable.Any[TSource](IEnumerable1 source, Func2 predicate)
at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContent()
at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContentAndTemplate()
at Umbraco.Web.Routing.PublishedContentRequestEngine.PrepareRequest()
at Umbraco.Web.UmbracoModule.ProcessRequest(HttpContextBase httpContext)
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
I gather from Twitter that you're using Azure? Have you tried setting the Indexers and Searchers to store data on the web worker? You can set it to LocalOnly or Sync.
We've seen problems with the default configuration where performance is very poor because the indexes are stored on a remote fileshare. Apart from bad performance it's possible that file locking issues occur because sometimes your Azure site gets moved to a different web worker (server), however sometimes file locks are not released when a move happens and your files will be in use for some time leading to errors like the one above.
We are using Azure. Currently in ExamineSettings.config we have
<Examine RebuildOnAppStart="false">
We did this because we thought the issue was after doing deploys the index was getting rebuilt and barfing. So looking at http://issues.umbraco.org/issue/U4-5993 we have 2 options:
1.Sync
2.LocalOnly
With RebuildOnAppStart set to false if we use either of those 2 options then after a deploy to bin would index get trashed thus requiring manual rebuild?
Okay, so you tried to work around what exactly? Do you have a huge index?
In which case, I think Sync makes the most sense as the only thing it does is copies the existing indexes from the fileshare to your site's temp folder on the web worker. Then it updates both local and fileshare when updates are needed.
If you don't have a huge index that takes loads of time rebuilding then I would set RebuildOnAppStart to true again and see how it goes, and then use LocalOnly with that.
It is a big index takes a while to rebuild. So i will try it with sync. We have staging setup on azure so will try those config updates then slam the site see what happens.
I have turned this on for our staging site. I have also hit with some load and cannot get the indexes to fail over. However what I don't understand is how this swapping to TempStorage will fix the issue.
We are not load balancing we are on single azure website. So does azure on the hood do some voodoo nas stuff with website folders?
We just want to confirm this will work before trying it on live as we have a cranky client and do not want to promise any more false dawns.
The voodoo is that all of your site's files on Azure live on a fileshare, which means that they don't actually exist on the machine that your website is running from (so the IIS server points to a fileshare to get all the files).
This means there's a significant lag in actually reading and writing files.
Examine/Lucene.net obviously doesn't like this lag very much, part of the reason that it's so fast is that it has really fast access to files on disk. If the disk is remote it doesn't have that great access.
By setting up the TempStorage you force the files to actually move to the temp folder on the IIS server, they not longer just live on the fileshare.
I don't know what you mean "if it doesn't exist". If the indexes exist in the website (App_Data/TEMP/ExamineIndexes) and you set TempStorage to Sync then if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder and be kept in sync with the indexes on the fileshare. If it doesn't exist on the fileshare then.. yeah, you will need to manually build them as RebuildOnAppStart is set to false.
I'd say try this setup out first. Then consider setting RebuildOnAppStart back to true. This shouldn't constantly rebuild indexes, ONLY when they don't exist yet on app start.
You said:
"if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder"
So the index is copied? It's not re-generated at the temp storage location?
The issue we had initially is that the index was corrupt. So what you're saying is the corrupt index will be copied?
What you had before doesn't count any more when you turn Sync on. Yes, they will be copied on app start then when any updates happen they get applied to both the web worker's asp.net temp folder AND the remote fileshare. If your site gets moved to a new web worker (this happens once in a while) the indexes once again first get copied from the fileshare to the web worker's asp.net temp folder and then updates will be applied in both places again.
If the index does manage to get corrupted on the fileshare then yes, it will be copied in that exact state. Obviously, you could set the setting to LocalOnly but it would mean a rebuild each time the site moves to a different server, which means startup time will be very long when that happens (as you've indicated you have a large amount of content to index).
So yes, again, it would be great to figure out why indexes get corrupted. I am not help there I'm afraid.
Firstly - it would seem that part of this problem is that you are using the old TempStorage (https://github.com/Shazwazza/UmbracoExamine.TempStorage) which is obsolete because the functionality is included in the Core. Further more, the functionality included in the core has many fixes and works much better. Secondly the legacy TempStorage provider is discontinued and will not longer be developed (i'll make a note of this on GitHub).
The only way that the index can get actually corrupted - meaning that it is unreadable/openable, for example if there are missing Lucene files is:
If someone is mucking around with those files directly - never do this, read or writing
If somehow the IIS process is unexpectedly terminated without warning specifically during the exact moment that Lucene is attempting to write files
Having "Sync" turned on doesn't mean whatsoever that there is more chance that your primary index storage (i.e. non temp storage) is more corruptible.
I plan on migrating this functionality into the Examine core at some stage whilst allowing more storage options for Azure but I have no time right now. The LocalOnly and Sync options are only available for Umbraco indexers/searchers, these options are not available for any custom indexers/searchers that you may have ... which is part of the reason this functionality needs to be migrated to Examine Core.
In the meantime, please ensure you are using the Umbraco Core indexers/searchers and not the TempStorage provider.
And as always, providing steps to reproduce goes a long way (once you are using the Umbraco Core indexers), this includes all information regarding how your environment is setup.
I am not using the obsolete package but using what is in core. We do not mess with lucene files directly in any way so rules that one out. The IIS process unexpectedly terminating could be a possibility.
We are not using any custom indexers just External and internal indexers.
I will keep an eye on it and if it goes again try and report back a bit more information.
Here's the config we have on the site that loses indexed content most often and rebuild from a local server like so. (A couple others, but every six months)
I guess it's like it should be?
<?xml version="1.0"?>
<!--
Umbraco examine is an extensible indexer and search engine.
This configuration file can be extended to add your own search/index providers.
Index sets can be defined in the ExamineIndex.config if you're using the standard provider model.
More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com
-->
<Examine RebuildOnAppStart="false">
<ExamineIndexProviders>
<providers>
<add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
supportUnpublished="true"
supportProtected="true"
runAsync="true"
useTempStorage="Sync"
analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
<add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
supportUnpublished="true"
supportProtected="true"
runAsync="true"
useTempStorage="Sync"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
<!-- default external indexer, which excludes protected and unpublished pages-->
<add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
runAsync="true"
useTempStorage="Sync"/>
</providers>
</ExamineIndexProviders>
<ExamineSearchProviders defaultProvider="ExternalSearcher">
<providers>
<add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"
useTempStorage="Sync"
/>
<add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
useTempStorage="Sync"
/>
<add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
enableLeadingWildcard="true"
useTempStorage="Sync"
/>
</providers>
</ExamineSearchProviders>
</Examine>
I need to understand what the actual issue people are seeing here is. 'Corrupted' could mean many things. Ismail is the only one that has posted a stack trace. Is this the exact same issue everyone is seeing?
System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\RD000D3A203E7D\External\Index\segments.gen' is denied. at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.SegmentInfos.ReadCurrentVersion(Directory directory) at Lucene.Net.Index.DirectoryReader.IsCurrent() at Lucene.Net.Index.DirectoryReader.DoReopenNoWriter(Boolean openReadOnly, IndexCommit commit) at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 359 at Examine.LuceneEngine.Providers.LuceneSearcher.GetSearcher() in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 239 at Examine.LuceneEngine.Providers.BaseLuceneSearcher.Search(ISearchCriteria searchParams, Int32 maxResults) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\BaseLuceneSearcher.cs:line 175 at Application.BusinessLogic.Services.SearchService.SearchTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\SearchService.cs:line 252 at Application.BusinessLogic.Services.TalentDetailService.GetTalentByUrl(String type, String urlPart) in
From that stack trace it shows that temp storage is not used since it is trying to access the master index during searching.
@lars what does loses indexed content mean?
And when everyone says they are using 'Azure', I'm assuming you mean Azure Apps right? not Azure VMs?
By now I'm sure you realize that Azure web apps store all files on a remote file share which can cause all sorts of issues. This temp storage technique is currently the only way to resolve issues caused by how their infrastructure is setup. I have discovered that we can persist local files in a different persistent temp storage area on Azure instead of the ASP.Net temp files (which are prone to being cleared when /bin folder changes). The things I'd like to pursue sometime in the future are:
Move this temp storage logic into Examine Core
Add configuration options to store the temp date into the persistent temp storage area on Azure apps instead of ASP.Net temp storage
Port the AzureDirectory blob storage functionality into Examine Core so that it works against Lucene 2.9 so people could have this option as well
Test using the logic that exists in AzureDirectory to perform the Sync logic between the main index and the temp index instead of performing a backup/copy as it does now
Create an Azure Search Examine provider - Darren already started this IIRC
I honestly have no idea when I'd find time to do any of this but it's on my ever growing TODO list... if anyone wants to help then lets start the discussion on the Examine GitHub repo.
Another thing to note is that Azure now supports running your sites 'locally' . I'm assuming they've enabled this feature specifically because people have so many problems running sites from remote file shares.
​"Local Cache enables your Apps to copy their code to storage on the local VMs running their site. Normally, Apps are run from a network based disk. Changing to storage local to the VM greatly improve the performance of languages like PHP, Node.js, and any other platform that needs to frequently read the files running it."​
@Shannon I'm sure I've sent you a few logs via zendesk with the only exceptions I have. Think I remember seeing the same as Ismail.
By "losing content" I mean that the index doesn't go corrupt. It goes empty and doesn't rebuild. Might end up with 0 docs, might end up with 5 docs. No apparent exceptions.
Anyway, I appreciate that this is close to impossible for HQ to tackle, and cross my fingers our collective efforts will solve this sooner or later.
We are running azure web app so hopefully the temp storage should fix the problem for us. Lars we are not doing runAsync=true however we have set
RebuildOnAppStart="false"
Just like you.
So for us mostly we end up with 0 documents in the index internal and external although we once had it that external had 5 docs we should have 20,000 plus in both.
Any ways fingers crossed the temp storage will resolve our issue. Will report back with any more issues.
runAsync=true is irrelavent... Examine is always Async, never ever ever ever set this to false. Just remove this setting alltogether and it will run async by default
RebuildOnAppStart - if this is false and you have no indexes then you have to manually build them. This setting does one thing: if your index doesn't exist during app startup, it will be created. If you set this to false and your index doesn't exist on app startup, then you will have no index at all and you will need to figure out how you get your index there.
This is before you've used "Sync" temp storage according to Ismail so let us know what the outcome of that is.
If you might end up with a non corrupt index (unreadable) that is simply empty it's probably due to app restarts. Say for example your site starts and you have no index, Umbraco will start building it. Let's say that at that moment your site restarts - this could be due to any number of things, maybe you are deploying and the file copying is slow so your site is restarting multiple times because multiple /bin files are changing, or multiple config files are changing, etc... Then the index will probably be created but nothing put in it because the site has restarted at that moment.
If your index is already there, there's no way for it to suddenly end up with 0 docs, the only way this happens is if the index is rebuilt. This can happen if:
You rebuild the index manually
The indexes don't already exist and you have RebuildOnAppStart
=true (which is the default if this setting isn't there)
You are scaling out your application and/or you site that is part of a Load Balanced cluster come online after being offline for a long time, in which case it needs to 'cold boot', this means all caches are rebuilt.
A corrupted index (unreadable) is a different story - these are two different things.
When the webapp is shutdown it notifies all IRegisteredObject which is the ExamineManager: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/ExamineManager.cs#L310 which in turn disposes all indexers. If an Index Rebuild is currently happening and the app is shut off, we cannot just lock the app from shutting down and waiting until re-indexing is done since this would cause all sorts of problems. Instead the only option is to lock for a small amount of time (5 seconds) if there is still items in the queue and release when either the items in the queue are processed or the timeout is finished. This the only way I can see you ending up with 0 (or a few) items in your index - a rebuild was issued for one reason or another and the app domain shuts down.
Thanks for the details.
When I get back to the project in question, I'll create a derived indexer and add all sorts of logging with System.IO.File from all events to see if I can get some more details on when/how it happens.
So my issue occurs after scaling or when Azure moves the host machine from under you, which is what might be happening to you guys when your issue only happens every couple of weeks. I've not resolved our issue yet as it fell down the priority list.
This is relevant to Azure only and allows for using a different persistent location for locally stored indexes instead of the volatile storage space that is ASP.Net temp files.
I'll warn you that if you are not using this setting: http://issues.umbraco.org/issue/U4-7614 then at some stage your Local indexes will be cleared out. This is fine if you are using Sync because they will just be re-copied from your master directory but if you are using LocalOnly then they will need to be rebuilt and if you have RebuildOnAppStart=false then that will be a manual job.
My suggestion if you are using Azure, use the setting mentioned in http://issues.umbraco.org/issue/U4-7614 so that your locally stored indexes are not blown away if you change global.asax, /bin folder (amongst a few others) which will clear your ASP.Net temp files.
Apologies for vague message. It had 0 documents I am using Sync but I do not have tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
I am updating to that now hopefully that should fix up everything.
Thanks Ismail, that certainly seems very odd and at this point i can only chalk this up to something strange going on with Azure and it's file system. So the index is empty in the normal AppData/TEMP storage? If that is the case I don't think that AzureLocalStorageDirectory will help because when using Sync it still needs to ensure that the master index stored in AppData/TEMP is written correctly and since it somehow get's rewritten as empty that is not good.
Please keep me updated on this though, i may be wrong and this could 'solve' it. In any case it's better to use this than the ASP.Net temp file location.
I might have just thought of why indexes get zero'd out. The simplest answer is that Azure has transferred your site to a new web worker with a totally different machine name and if you have your Examine configuration as the default that we ship with with the {machinename}token in your paths, this would of course mean the machine name is now totally different from before and the index no longer exists at that location. Can you confirm this is how you have your index paths configured? If so, then I will have to assume this is probably the cause and something I didn't really consider before!
I can confirm we have this issue every time Azure moves us to a new host machine (I mentioned this above )
I keep an eye out for if we've been moved (in the umbracoServer DB table) and if we have I have to manually kick off the indexers. The same goes if manually scale out.
We are using the LocalOnly setting (as opposed to Sync) so the AppData\Temp path only contains empty folders at the moment.
Ok, this sounds like the issue!! So YES you can fix this, just remove the {machinename}/ token from your paths.
This is there primarily for load balancers - so people don't have to manually configure this when they wish to load balance. I did not take into account this scenario when working in virtualized cloud appliances where sites are moved from one machine to another.
I'll update the defaults to not include this and update docs for load balancers who do actually require this setting.
@James - No Azure websites is not load balanced. There is no more than a single instance of a web worker accessing the db and file system at any given time. This is not load balancing. If you do run more than one instance - this can take several shapes:
Scaling out on Azure websites
In this case - you would still require the {machinename} token so that your indexes are stored per machine
Non Azure websites (i.e. VMs) and/or doing load balancing according to our docs
Requiring the {machinename} token would be dependent on how your file system is setup - shared vs replicated. If it is replicate the token is not required
@Lars - yes rebuildOnAppStart="true" (which is the default if not specfied) should trigger a full rebuild when no index exists. So I agree there is some other issue at play here. When you do scale out, it does rebuild the indexes on the newly created web workers, I've never seen this not happen in all of my tests. I would have to assume something quite specific happens when a site is moved to another worker/machine and is most likely due to something Azure does with it's file systems and controlling how ASP.Net app domains restart. Perhaps in some way Examine attempts to rebuild the index - this starts out by creating an empty index and then maybe based on some timing or whatever Azure is doing in the background it terminates the app domain again leaving an empty index. On the next restart Examine will not rebuild because there's a valid (although empty) index there.
It would be interesting to add Debug level logging to your Examine loggers so you can see the full log output of Examine during these times. You can do this without changing your global log4net logging level by targeting a specific logger, for example see: https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Web.UI/config/log4net.config under the <!--Here you can change the way logging works for certain namespaces --> text. You could add
There's certainly a way that we can work around this behavior I just need to determine if it's an Examine change needed or if it should be in UmbracoExamine, but we could:
On app start when no indexes are detected and are scheduled to be rebuilt, we create marker files for each index
When indexes are rebuilt, these marker files are removed
If the app domain is terminated before the rebuild occurs, the marker files will remain
During app startup we also check for these marker files, if they exist but a valid index is there, we still rebuild
Well, then again I've still got rebuildOnAppStart="false", so that's a faceslap. We'll go back to true. :/
Soz. (Quite preoccupied these days, this is less than tertiary in brain process)
@Shannon - Do you mean to say Azure websites is not load balanced unless you scale out? Since an Azure website can be scaled at any time (if enabled) I would always want to configure an Azure website to work in a load-balanced manner, even if it runs as one instance 90% of the time.
I haven't specified rebuildOnAppStart so should be using the default of "true". I posted my trace logs on the following forum post previously:
@James - when you are not scaling out on Azure websites you are definitely not load balancing. If you are scaling out then you are ... BUT, you need to do this in a particular way with a master and slave setup (see docs, NOTE: we do not support having a single Azure website instance that scales). In this case (and based on this problem above), you would need to:
Set your Examine config on your master (non-scaled) environment to not have the {machinename} token
Set your Examine config on your slave (scale able) environment to have the {machinename} token
I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically. Though there is a chance something like this could happen when sites are transferred to a different machine: https://our.umbraco.org/forum/developers/extending-umbraco/74731-examine-corruption-issues#comment-243531
I would have to see if I can actually somehow replicate such an issue to see if writing a marker file would actually solve such a problem since we don't actually know if this is a problem or not.
In the future I would like to backport the AzureDirectory project for Lucene to work with Lucene 2.9 and release under a different fork. Then we can utilize this for more effective scaling with regards to Lucene.
Not an easy one. I should re-state that we didn't have this problem until we upgraded from 7.3.1 to 7.3.7, although I've never been sure if the upgraded binaries caused it, or a mistake in my upgrade process.
Is there a status update on this issue? We've got an Azure website where scaling is enabled. For now we removed the {machinename} token and everything seems to be working.
Also good to mention is that all editing is done on a seperate environment (VM on azure). There is no content creation on the Web app except for creating members.
The issues where we ran into is that when Azure changes the machine name the indexes need to be rebuild. If you have the machine name in your index path this will always happen. Without it doesn't.
And we have some huge indexes. So building them from scratch took a long time that caused other issues for us.
Dave explained the environment setup a bit better ;-). So far with this setup it's still best to remove the {machinename} token, but please correct us if we are wrong :-).
@Shannon you said: "I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically."
Like Dave said we have huge indexes and building from scratch took too long. So that is not really an option for us.
@jeroen If you are scaling, then perhaps it's better to have 2x scaled out websites setup - and to reiterate - you MUST have {machinename} in the path when you are scaling out. In this setup, you would have 2 instances that are active, one of which should certainly have available indexes, if the other one gets moved to another server and needs to rebuild, Azure should consider that one non responsive and send more requests to the one that does have an index available. Then when you scale out more, yes each one would need to rebuild there as well.
@all:
Currently there is no perfect solution to this, you would need to write your own until we can have something in place which may require updates to Examine and/or UmbracoExamine. To solve your problem, here's what you can do:
Create sub classes of the examine indexers and searchers shipped with Umbraco and use an AzureDirectory instance instead of the default lucene Directory instance
What AzureDirectory does is similar to what Umbraco is doing:
It stores the master index in Blob storage
Only a master server can write to it
For each slave server, the blob storage index files are synced to the local machine
OR
There's potentially an ' easy' solution in which you just deal with the fact that the index will need rebuilding on new workers (or if workers are moved between hosts)
In this case we can terminate any request coming to the website when we detect an index or the app isn't ready, returning a 503 with a retry-after-header. This should tell the Azure Load Balancer that the node isn't ready and the Azure LB will try again based on it's own timeout.
OR
During startup you could have some logic to detect if there isn't an index on the slave server
In that case you could create some sort of REST endpoint on your master server that your slaves could talk to and request a snapshot of the index which they can download and store locally
OR
Attempt to create an Azure Search or Elastic Search Examine provider and host indexes in a centralized place
OR
If it's just the Member index that is the main problem regarding rebuilding - only the back office member search uses the member indexer, I don't believe there is anything else in the core that specifically uses the Member indexer
You could disable the member indexer and index members in your own custom way with Azure Search, Elastic Search, etc... and use those APIs to search your member data
Due to the nature of Azure, the way it structures it's file systems, the way it virtualizes things and moves sites between workers - there isn't a perfect solution that we can simply ship with from an Umbraco core perspective that will solve everyone's particular problems. Not everyone uses Azure, and there would probably be other/different issues with other virtualized hosts that can scale out - and again, a specific solution would probably need to be created for that in one way or another.
I would certainly enjoy some help with any of this since there is quite a lot of options, work, etc...
We took out machine name from the config and we turned on tempStorage. It all seemed to be working fine however now we get regular:
System.IO.DirectoryNotFoundException: Could not find a part of the path 'D:\local\Temporary ASP.NET Files\root\a38b9d33\a911d035\App_Data\TEMP\ExamineIndexes\Internal\segments_7'.
at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPolicy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexDivisor)
at UmbracoExamine.UmbracoExamineSearcher.OpenNewReader()
at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 307
--- End of inner exception stack trace ---
Errors. For now i have turned off tempstorage and will see how it behaves. So no machine name in config and auto rebuild is false. My question I guess for Shannon will this still barf as azure does it voodoo? If it does then I am going to have to do 2am index daily rebuilds?
Storing index files in asp.net temp storage is slightly dangerous because that storage is volatile (i.e. if you're /bin or global.asax changes it will be cleared out). Also because Azure does some voodoo when moving your site between web workers this also becomes a pain. With U4-7614, it means the local azure storage is less volatile ... though when azure moves your site between web workers your index will still need to rebuild - unless of course you have 'Sync' turned on (which is the whole point of sync).
Let's re-iterate some facts:
Lucene + Azure web apps (Remote file storage) == Awful for performance
Lucene files need to be stored locally for good performance == useTempStorage
The better option for local Lucene storage with Azure is with U4-7614
Having rebuild on startup turned off + LocalOnly (Sync turned off) + useTempStorage == you will not have indexes
Having {machinename} in your path on Azure is annoying because Azure moves your site between workers/machines so the path will change. BUT if you are auto-scaling on Azure then you have to have {machinename} there or else everything will explode. But this means that when a site comes online and the index location doesn't exist, it needs to be rebuilt
In my opinion, if you are using Azure web apps and are NOT auto-scaling, you should use these settings:
Using Azure web apps and balancing with auto scaling my front end workers, I have the recommended configuration as per your post, however I have a problem-
When the number of workers scales out the machine name changes and the 'RebuildOnAppStart=true' setting doesn't seem to be working, the indexes arent rebuilt and are broken due to the machinename change.
As always, it's difficult to help without knowing version information, etc... There's been reports of 'cold boot' not working effectively on 7.3.x, please ensure you are using the latest version and see if that works. When a server comes online for the first time it 'cold boots', you can set the log4net level to Debug and see if you get log entries for index items being created. If you don't it's probably an issue with the umbraco version you are using.
If you are using Azure web apps and are load balancing w/ auto-scaling
your front-end workers then:
You must have the {machinename} token from your index path
Can you confirm that this should only be set on the Slave? I followed the instructions from the docs and set it on both master and slave. We have experienced many problems related to indexing and multiple restarts on warm up, often resulting in outages of over 20 minutes. I was about to disable the scaling options until I stumbled across this post.
Also very much looking forward to having an easy to configure AzureDirectory.
Just to chime in here, we're having identical issues to Ismail. Even though everything's set to use Sync, the Internal Indexer occasionally craps out looking for the segments file in the ASP.Net temp folder, rather than the user temp folder.
I have updated my config so i am now doing rebuild indexes on restart. Have pushed up to my azure staging I am hoping that after the web worker switch which I am assuming is like a restart it will find the empty index and rebuild it?
Examine corruption issues
Hello,
We are using Umbraco 7.3.4 and parts of the site use Examine searches extensively. Content updates to the site are occasional so publishes etc are not that frequent. On publish we do tap into examine events to inject fields into the index.
Since update to Umbraco 7.3.4 we are getting issues with indexes corrupting.
There is nothing in Umbraco logs also we are using elmah.io and there is nothing in elmah. Site is sitting on azure and we have looked at eventlog also cannot see anything there. We thought initially it may be something to do with app pool restarting and index rebuilding. So we updated the ExamineSettings config RebuildOnAppStart and set that to false. This fixed the issue for about 2 weeks however recently it happened again.
We are having to rebuild indexes and in some instances restarting app pool then rebuilding indexes to get the search powered functionality back up.
Has anyone else seen this?
Regards
Ismail
Could it be this? http://issues.umbraco.org/issue/U4-6338
We've had issues with indexes locking which then get out of sync / corrupted requiring app pool to be stopped to free up the lock and reindex. Applying the hot fix mentioned in that issue seems to have resolved it for us so far.
Matt
Matt,
Site is on azure so not sure if this applies?
Regards
Ismail
Hi Ismail,
We had some issues with examine as well on a azure web app and noticed also performance issues accessing examine.
After changing some configuration our problems were solved
Apply the settings on this page under Common load balancing setup : https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/
Then apply the settings described here : https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/flexible
Dave
Dave,
Will take a look however we are not load balancing
Regards
Ismail
Hi Ismail,
We only have one web app for the moment and these config changes made a drastic impact on performance.
Dave
Hi Ismail,
Are your logs showing errors similar to the ones in this issue?
http://issues.umbraco.org/issue/U4-7869
James,
Not that I can see well take another peek though. Also we will upgrade to 7.3.7 ASAP. Hopefully that my fix it, Jeavon's investigations look very promising, also his coding skills there are a good reason for his back to back MVP's the guy is a legend!!
Cheers
Ismail
Isn't he just!
Guys,
Still getting this issue at least once every 2 weeks. Anyone seen this before?
I did see this in the umbracolog:
System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\RD000D3A203E7D\External\Index\segments.gen' is denied. at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.SegmentInfos.ReadCurrentVersion(Directory directory) at Lucene.Net.Index.DirectoryReader.IsCurrent() at Lucene.Net.Index.DirectoryReader.DoReopenNoWriter(Boolean openReadOnly, IndexCommit commit) at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 359 at Examine.LuceneEngine.Providers.LuceneSearcher.GetSearcher() in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 239 at Examine.LuceneEngine.Providers.BaseLuceneSearcher.Search(ISearchCriteria searchParams, Int32 maxResults) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\BaseLuceneSearcher.cs:line 175 at Application.BusinessLogic.Services.SearchService.SearchTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\SearchService.cs:line 252 at Application.BusinessLogic.Services.TalentDetailService.GetTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\TalentDetailService.cs:line 109 at Application.Web.ContentFinders.TalentContentFinder.TryFindContent(PublishedContentRequest request) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.Web\ContentFinders\TalentContentFinder.cs:line 33 at System.Linq.Enumerable.Any[TSource](IEnumerable
1 source, Func
2 predicate) at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContent() at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContentAndTemplate() at Umbraco.Web.Routing.PublishedContentRequestEngine.PrepareRequest() at Umbraco.Web.UmbracoModule.ProcessRequest(HttpContextBase httpContext) at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)Regards
Ismail
I gather from Twitter that you're using Azure? Have you tried setting the Indexers and Searchers to store data on the web worker? You can set it to
LocalOnly
orSync
. We've seen problems with the default configuration where performance is very poor because the indexes are stored on a remote fileshare. Apart from bad performance it's possible that file locking issues occur because sometimes your Azure site gets moved to a different web worker (server), however sometimes file locks are not released when a move happens and your files will be in use for some time leading to errors like the one above.For config options check out the explanations here: http://issues.umbraco.org/issue/U4-5993
Sebastiaan,
We are using Azure. Currently in ExamineSettings.config we have
We did this because we thought the issue was after doing deploys the index was getting rebuilt and barfing. So looking at http://issues.umbraco.org/issue/U4-5993 we have 2 options:
1.Sync 2.LocalOnly
With RebuildOnAppStart set to false if we use either of those 2 options then after a deploy to bin would index get trashed thus requiring manual rebuild?
Regards
Ismail
Okay, so you tried to work around what exactly? Do you have a huge index?
In which case, I think
Sync
makes the most sense as the only thing it does is copies the existing indexes from the fileshare to your site's temp folder on the web worker. Then it updates both local and fileshare when updates are needed.If you don't have a huge index that takes loads of time rebuilding then I would set
RebuildOnAppStart
to true again and see how it goes, and then useLocalOnly
with that.Sebastiaan,
It is a big index takes a while to rebuild. So i will try it with sync. We have staging setup on azure so will try those config updates then slam the site see what happens.
Cheers
Ismail
Great, try with Sync and RebuildOnAppStart=false then!
Sebastiaan,
I have turned this on for our staging site. I have also hit with some load and cannot get the indexes to fail over. However what I don't understand is how this swapping to TempStorage will fix the issue.
We are not load balancing we are on single azure website. So does azure on the hood do some voodoo nas stuff with website folders?
We just want to confirm this will work before trying it on live as we have a cranky client and do not want to promise any more false dawns.
Regards
Ismail
The voodoo is that all of your site's files on Azure live on a fileshare, which means that they don't actually exist on the machine that your website is running from (so the IIS server points to a fileshare to get all the files).
This means there's a significant lag in actually reading and writing files.
Examine/Lucene.net obviously doesn't like this lag very much, part of the reason that it's so fast is that it has really fast access to files on disk. If the disk is remote it doesn't have that great access.
By setting up the TempStorage you force the files to actually move to the temp folder on the IIS server, they not longer just live on the fileshare.
Hope that makes sense!
Aha that makes sense. Cool. We will get that deployed and hopefully should fix this issue.
Cheers
Ismail
Hi Seb
We looked at the code which uses the temp storage folder. It seems that if the temp storage doesnt exist, then it falls back to the standard one.
This can be a problem as there will be times when the temp storage doesnt exist, AND the standard index is broken.
So this "fix" would probably only mitigate the issue, not actually solve it. The real fix is to figure out why the index keeps getting broken.
I agree. And good luck with that. ;-)
I don't know what you mean "if it doesn't exist". If the indexes exist in the website (
App_Data/TEMP/ExamineIndexes
) and you set TempStorage toSync
then if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder and be kept in sync with the indexes on the fileshare. If it doesn't exist on the fileshare then.. yeah, you will need to manually build them asRebuildOnAppStart
is set tofalse
.I'd say try this setup out first. Then consider setting
RebuildOnAppStart
back totrue
. This shouldn't constantly rebuild indexes, ONLY when they don't exist yet on app start.You're right. We had RebuildOnAppStart = false.
You said: "if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder"
So the index is copied? It's not re-generated at the temp storage location?
The issue we had initially is that the index was corrupt. So what you're saying is the corrupt index will be copied?
What you had before doesn't count any more when you turn
Sync
on. Yes, they will be copied on app start then when any updates happen they get applied to both the web worker's asp.net temp folder AND the remote fileshare. If your site gets moved to a new web worker (this happens once in a while) the indexes once again first get copied from the fileshare to the web worker's asp.net temp folder and then updates will be applied in both places again.If the index does manage to get corrupted on the fileshare then yes, it will be copied in that exact state. Obviously, you could set the setting to
LocalOnly
but it would mean a rebuild each time the site moves to a different server, which means startup time will be very long when that happens (as you've indicated you have a large amount of content to index).So yes, again, it would be great to figure out why indexes get corrupted. I am not help there I'm afraid.
"If the index does manage to get corrupted on the fileshare then yes, it will be copied in that exact state."
This is why I drink :(
Firstly - it would seem that part of this problem is that you are using the old TempStorage (https://github.com/Shazwazza/UmbracoExamine.TempStorage) which is obsolete because the functionality is included in the Core. Further more, the functionality included in the core has many fixes and works much better. Secondly the legacy TempStorage provider is discontinued and will not longer be developed (i'll make a note of this on GitHub).
The only way that the index can get actually corrupted - meaning that it is unreadable/openable, for example if there are missing Lucene files is:
Having "Sync" turned on doesn't mean whatsoever that there is more chance that your primary index storage (i.e. non temp storage) is more corruptible.
I plan on migrating this functionality into the Examine core at some stage whilst allowing more storage options for Azure but I have no time right now. The LocalOnly and Sync options are only available for Umbraco indexers/searchers, these options are not available for any custom indexers/searchers that you may have ... which is part of the reason this functionality needs to be migrated to Examine Core.
In the meantime, please ensure you are using the Umbraco Core indexers/searchers and not the TempStorage provider.
And as always, providing steps to reproduce goes a long way (once you are using the Umbraco Core indexers), this includes all information regarding how your environment is setup.
Shannon,
I am not using the obsolete package but using what is in core. We do not mess with lucene files directly in any way so rules that one out. The IIS process unexpectedly terminating could be a possibility.
We are not using any custom indexers just External and internal indexers.
I will keep an eye on it and if it goes again try and report back a bit more information.
Regards
Ismail
Here's the config we have on the site that loses indexed content most often and rebuild from a local server like so. (A couple others, but every six months) I guess it's like it should be?
@ismail Do you have
runAsync="true"
btw?Strikes me that async combined with IIS process terminating might be the real purp.
Just FYI:
runAsync="true"
is totally unnecessary.I need to understand what the actual issue people are seeing here is. 'Corrupted' could mean many things. Ismail is the only one that has posted a stack trace. Is this the exact same issue everyone is seeing?
From that stack trace it shows that temp storage is not used since it is trying to access the master index during searching.
@lars what does
loses indexed content
mean?And when everyone says they are using 'Azure', I'm assuming you mean Azure Apps right? not Azure VMs?
By now I'm sure you realize that Azure web apps store all files on a remote file share which can cause all sorts of issues. This temp storage technique is currently the only way to resolve issues caused by how their infrastructure is setup. I have discovered that we can persist local files in a different persistent temp storage area on Azure instead of the ASP.Net temp files (which are prone to being cleared when /bin folder changes). The things I'd like to pursue sometime in the future are:
I honestly have no idea when I'd find time to do any of this but it's on my ever growing TODO list... if anyone wants to help then lets start the discussion on the Examine GitHub repo.
Another thing to note is that Azure now supports running your sites 'locally' . I'm assuming they've enabled this feature specifically because people have so many problems running sites from remote file shares.
@Shannon I'm sure I've sent you a few logs via zendesk with the only exceptions I have. Think I remember seeing the same as Ismail.
By "losing content" I mean that the index doesn't go corrupt. It goes empty and doesn't rebuild. Might end up with 0 docs, might end up with 5 docs. No apparent exceptions.
Anyway, I appreciate that this is close to impossible for HQ to tackle, and cross my fingers our collective efforts will solve this sooner or later.
Shannon,
We are running azure web app so hopefully the temp storage should fix the problem for us. Lars we are not doing runAsync=true however we have set
Just like you.
So for us mostly we end up with 0 documents in the index internal and external although we once had it that external had 5 docs we should have 20,000 plus in both.
Any ways fingers crossed the temp storage will resolve our issue. Will report back with any more issues.
Regards
Ismail
To re-iterate some facts:
runAsync=true
is irrelavent... Examine is always Async, never ever ever ever set this to false. Just remove this setting alltogether and it will run async by defaultRebuildOnAppStart
- if this is false and you have no indexes then you have to manually build them. This setting does one thing: if your index doesn't exist during app startup, it will be created. If you set this to false and your index doesn't exist on app startup, then you will have no index at all and you will need to figure out how you get your index there.@Shannon
The crux of our issue is documents ends up being 0.
So I guess we need to know what's causing that, and is there a possible fix/workaround?
This is before you've used "Sync" temp storage according to Ismail so let us know what the outcome of that is.
If you might end up with a non corrupt index (unreadable) that is simply empty it's probably due to app restarts. Say for example your site starts and you have no index, Umbraco will start building it. Let's say that at that moment your site restarts - this could be due to any number of things, maybe you are deploying and the file copying is slow so your site is restarting multiple times because multiple /bin files are changing, or multiple config files are changing, etc... Then the index will probably be created but nothing put in it because the site has restarted at that moment.
If your index is already there, there's no way for it to suddenly end up with 0 docs, the only way this happens is if the index is rebuilt. This can happen if:
A corrupted index (unreadable) is a different story - these are two different things.
There must be a way for it to suddenly end up with 0 docs.
We don't rebuild it manually.
We have RebuildOnAppStart = false
We only have one server, with fcnMode="single"
The only way I see it can end up with 0 is the DeleteAll statement in EnsureIndex. Somehow, it must have been called with true for force.
Workaround: https://gist.github.com/lars-erik/f1d1e0226f5a552dc278
Right.
We've got more or less the same numbers as Ismail. (0 or 5, but should have >20.000)
Just throwing out ideas here after skimming LuceneIndexer.cs:
Answers to your questions:
Only here, this is how index rebuild works: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L772
That said, this logic is unnecessary when rebuilding an existing index since Lucene can re-create an existing index using the same logic found in
CreateNewIndex
( https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L732 ) without causing problems with current searches. See: https://github.com/Shazwazza/Examine/issues/37yes they do during indexing, i've not seen any unhandled exceptions, any errors during indexing will be reported and there's an event
We don't commit if there are errors: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1545 It could be feasible to perform a rollback (which closes the writer) and then re-open the Writer. I don't think this is going to affect the issue you're seeing but could be worthwhile. If an error occurred though, you'd see it in your logs.
Yes, this what happens: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1545
Regarding app restarts, see this code here when an indexer is disposed: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1856
When the webapp is shutdown it notifies all
IRegisteredObject
which is theExamineManager
: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/ExamineManager.cs#L310 which in turn disposes all indexers. If an Index Rebuild is currently happening and the app is shut off, we cannot just lock the app from shutting down and waiting until re-indexing is done since this would cause all sorts of problems. Instead the only option is to lock for a small amount of time (5 seconds) if there is still items in the queue and release when either the items in the queue are processed or the timeout is finished. This the only way I can see you ending up with 0 (or a few) items in your index - a rebuild was issued for one reason or another and the app domain shuts down.Thanks for the details.
When I get back to the project in question, I'll create a derived indexer and add all sorts of logging with System.IO.File from all events to see if I can get some more details on when/how it happens.
It does log quite a lot if you turn on Debug level logging in log4net
I know, but I don't want debug logging in production. I get more control if I just hack it. :)
(Oh well, I might as well get to know how to get only examine in debug mode. Lazy towards log4net config.)
Just wanted to reference my issue here, which sounds similar:
https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/75289-azure-website-not-initializing-examine-index-after-scaling
So my issue occurs after scaling or when Azure moves the host machine from under you, which is what might be happening to you guys when your issue only happens every couple of weeks. I've not resolved our issue yet as it fell down the priority list.
Hi all, just thought I'd post up this since it hasn't been documented yet (and i sort of forgot about it): http://issues.umbraco.org/issue/U4-7614
This is relevant to Azure only and allows for using a different persistent location for locally stored indexes instead of the volatile storage space that is ASP.Net temp files.
Shannon,
We are using 7.3.8 is this property tempStorageDirectory present in that version the issue tracker says its due in 7.3.5?
Also with Sync on our current location of index is
D:\local\Temporary ASP.NET Files\root\cb0cbdf4\7beae6bd\App_Data\TEMP\ExamineIndexes\RD000D3A207D42\Internal\
its in appdata not appcode so should not get cleared out after updates to bin?
Regards
Ismail
If it says it is due in 7.3.5 then it is... you can always try it or look into the code.
If you are not using the option I just mentioned, it IS stored in ASP.Net temp files, just like your path says:
D:\local\Temporary ASP.NET Files\
Shannon,
I did do rebuild yesterday and indexes were still there so will just leave it for now like this. Keeping an eye on it so far so good.
Regards
Ismail
I'll warn you that if you are not using this setting: http://issues.umbraco.org/issue/U4-7614 then at some stage your Local indexes will be cleared out. This is fine if you are using Sync because they will just be re-copied from your master directory but if you are using LocalOnly then they will need to be rebuilt and if you have RebuildOnAppStart=false then that will be a manual job.
My suggestion if you are using Azure, use the setting mentioned in http://issues.umbraco.org/issue/U4-7614 so that your locally stored indexes are not blown away if you change global.asax, /bin folder (amongst a few others) which will clear your ASP.Net temp files.
Shannon,
The indexes died again. I will try tempStorageDirectory attribute see if that helps.
Regards
Ismail
Hi Ismail,
Can you please explain what "indexes died" means?
Do you mean it's just empty? or is it actually corrupt? Is there logs this time? I'm assuming this is now that you are using "Sync" or "LocalOnly"?
I really want everyone can be very specific when reporting issues, I don't know if your problem is the same as others and I cannot read minds.
Shannon,
Apologies for vague message. It had 0 documents I am using Sync but I do not have tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
I am updating to that now hopefully that should fix up everything.
Regards
Ismail
Thanks Ismail, that certainly seems very odd and at this point i can only chalk this up to something strange going on with Azure and it's file system. So the index is empty in the normal AppData/TEMP storage? If that is the case I don't think that AzureLocalStorageDirectory will help because when using Sync it still needs to ensure that the master index stored in AppData/TEMP is written correctly and since it somehow get's rewritten as empty that is not good.
Please keep me updated on this though, i may be wrong and this could 'solve' it. In any case it's better to use this than the ASP.Net temp file location.
I might have just thought of why indexes get zero'd out. The simplest answer is that Azure has transferred your site to a new web worker with a totally different machine name and if you have your Examine configuration as the default that we ship with with the
{machinename}
token in your paths, this would of course mean the machine name is now totally different from before and the index no longer exists at that location. Can you confirm this is how you have your index paths configured? If so, then I will have to assume this is probably the cause and something I didn't really consider before!Shannon,
That is how they are currently configured. Is there anything we can do so that they are not configured this way?
Regards
Ismail
We are also having the issues Ismail reported.
We have this in our config ;
And have the machine name in the path. We mostly face issues when azure transfers to a new Web worker which seems to happen quite often.
Dave
Hi Shannon,
I can confirm we have this issue every time Azure moves us to a new host machine (I mentioned this above )
I keep an eye out for if we've been moved (in the umbracoServer DB table) and if we have I have to manually kick off the indexers. The same goes if manually scale out.
We are using the LocalOnly setting (as opposed to Sync) so the AppData\Temp path only contains empty folders at the moment.
James.
This is spot on!
I can confirm that our "index watcher" has rebuilt indexes at the exact same times as the registeredDate columns in umbracoServer. :)
Great work, James!
James,
When you say keep an eye out for it, how do you mean? Do you poll that table or you have a trigger on it that then does the re index?
Regards
Ismail
Ok, this sounds like the issue!! So YES you can fix this, just remove the {machinename}/ token from your paths.
This is there primarily for load balancers - so people don't have to manually configure this when they wish to load balance. I did not take into account this scenario when working in virtualized cloud appliances where sites are moved from one machine to another.
I'll update the defaults to not include this and update docs for load balancers who do actually require this setting.
I already updated my environments. Will keep you posted if we keep seeing issues.
Dave
But shouldn't rebuildOnAppStart trigger and do full rebuild when the site is moved? Seems like it doesn't.
Surely Azure should be considered a "Load balanced" environment? What is the fix if you run more than one instance?
@James - No Azure websites is not load balanced. There is no more than a single instance of a web worker accessing the db and file system at any given time. This is not load balancing. If you do run more than one instance - this can take several shapes:
@Lars - yes rebuildOnAppStart="true" (which is the default if not specfied) should trigger a full rebuild when no index exists. So I agree there is some other issue at play here. When you do scale out, it does rebuild the indexes on the newly created web workers, I've never seen this not happen in all of my tests. I would have to assume something quite specific happens when a site is moved to another worker/machine and is most likely due to something Azure does with it's file systems and controlling how ASP.Net app domains restart. Perhaps in some way Examine attempts to rebuild the index - this starts out by creating an empty index and then maybe based on some timing or whatever Azure is doing in the background it terminates the app domain again leaving an empty index. On the next restart Examine will not rebuild because there's a valid (although empty) index there.
It would be interesting to add Debug level logging to your Examine loggers so you can see the full log output of Examine during these times. You can do this without changing your global log4net logging level by targeting a specific logger, for example see: https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Web.UI/config/log4net.config under the
<!--Here you can change the way logging works for certain namespaces -->
text. You could addThere's certainly a way that we can work around this behavior I just need to determine if it's an Examine change needed or if it should be in UmbracoExamine, but we could:
Well, then again I've still got rebuildOnAppStart="false", so that's a faceslap. We'll go back to true. :/
Soz. (Quite preoccupied these days, this is less than tertiary in brain process)
@Shannon - Do you mean to say Azure websites is not load balanced unless you scale out? Since an Azure website can be scaled at any time (if enabled) I would always want to configure an Azure website to work in a load-balanced manner, even if it runs as one instance 90% of the time.
I haven't specified rebuildOnAppStart so should be using the default of "true". I posted my trace logs on the following forum post previously:
https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/75289-azure-website-not-initializing-examine-index-after-scaling
Hope that helps.
@James - when you are not scaling out on Azure websites you are definitely not load balancing. If you are scaling out then you are ... BUT, you need to do this in a particular way with a master and slave setup (see docs, NOTE: we do not support having a single Azure website instance that scales). In this case (and based on this problem above), you would need to:
I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically. Though there is a chance something like this could happen when sites are transferred to a different machine: https://our.umbraco.org/forum/developers/extending-umbraco/74731-examine-corruption-issues#comment-243531
I would have to see if I can actually somehow replicate such an issue to see if writing a marker file would actually solve such a problem since we don't actually know if this is a problem or not.
In the future I would like to backport the AzureDirectory project for Lucene to work with Lucene 2.9 and release under a different fork. Then we can utilize this for more effective scaling with regards to Lucene.
Not an easy one. I should re-state that we didn't have this problem until we upgraded from 7.3.1 to 7.3.7, although I've never been sure if the upgraded binaries caused it, or a mistake in my upgrade process.
Hello,
Is there a status update on this issue? We've got an Azure website where scaling is enabled. For now we removed the {machinename} token and everything seems to be working.
Jeroen
Did you read above? https://our.umbraco.org/forum/developers/extending-umbraco/74731-examine-corruption-issues#comment-243584 I can't just remove the token on your scaling worker instance, I won't be able to scale... Unless u have local only turned on
This is our current setup for the external indexer which we use a lot:
ExamineIndex.config
ExamineSettings.config
If we add the {machinename} token we run into issues where we could lose all index files.
Jeroen
Hi Jeroen, Shannon,
Also good to mention is that all editing is done on a seperate environment (VM on azure). There is no content creation on the Web app except for creating members.
The issues where we ran into is that when Azure changes the machine name the indexes need to be rebuild. If you have the machine name in your index path this will always happen. Without it doesn't.
And we have some huge indexes. So building them from scratch took a long time that caused other issues for us.
Dave
Hello,
Dave explained the environment setup a bit better ;-). So far with this setup it's still best to remove the {machinename} token, but please correct us if we are wrong :-).
@Shannon you said: "I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically."
Like Dave said we have huge indexes and building from scratch took too long. So that is not really an option for us.
Jeroen
@jeroen If you are scaling, then perhaps it's better to have 2x scaled out websites setup - and to reiterate - you MUST have {machinename} in the path when you are scaling out. In this setup, you would have 2 instances that are active, one of which should certainly have available indexes, if the other one gets moved to another server and needs to rebuild, Azure should consider that one non responsive and send more requests to the one that does have an index available. Then when you scale out more, yes each one would need to rebuild there as well.
@all:
Currently there is no perfect solution to this, you would need to write your own until we can have something in place which may require updates to Examine and/or UmbracoExamine. To solve your problem, here's what you can do:
What AzureDirectory does is similar to what Umbraco is doing:
OR
OR
OR
OR
Due to the nature of Azure, the way it structures it's file systems, the way it virtualizes things and moves sites between workers - there isn't a perfect solution that we can simply ship with from an Umbraco core perspective that will solve everyone's particular problems. Not everyone uses Azure, and there would probably be other/different issues with other virtualized hosts that can scale out - and again, a specific solution would probably need to be created for that in one way or another.
I would certainly enjoy some help with any of this since there is quite a lot of options, work, etc...
Guys,
We took out machine name from the config and we turned on tempStorage. It all seemed to be working fine however now we get regular:
at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPolicy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexDivisor) at UmbracoExamine.UmbracoExamineSearcher.OpenNewReader() at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 307 --- End of inner exception stack trace ---
Errors. For now i have turned off tempstorage and will see how it behaves. So no machine name in config and auto rebuild is false. My question I guess for Shannon will this still barf as azure does it voodoo? If it does then I am going to have to do 2am index daily rebuilds?
Regards
Ismail
See above where I mention http://issues.umbraco.org/issue/U4-7614, available in 7.3.5+ specifically for Azure
Storing index files in asp.net temp storage is slightly dangerous because that storage is volatile (i.e. if you're /bin or global.asax changes it will be cleared out). Also because Azure does some voodoo when moving your site between web workers this also becomes a pain. With U4-7614, it means the local azure storage is less volatile ... though when azure moves your site between web workers your index will still need to rebuild - unless of course you have 'Sync' turned on (which is the whole point of sync).
Let's re-iterate some facts:
useTempStorage
In my opinion, if you are using Azure web apps and are NOT auto-scaling, you should use these settings:
If you are using Azure web apps and are load balancing w/ auto-scaling your front-end workers then:
Hi,
Using Azure web apps and balancing with auto scaling my front end workers, I have the recommended configuration as per your post, however I have a problem-
When the number of workers scales out the machine name changes and the 'RebuildOnAppStart=true' setting doesn't seem to be working, the indexes arent rebuilt and are broken due to the machinename change.
Regards, Jamie
As always, it's difficult to help without knowing version information, etc... There's been reports of 'cold boot' not working effectively on 7.3.x, please ensure you are using the latest version and see if that works. When a server comes online for the first time it 'cold boots', you can set the log4net level to Debug and see if you get log entries for index items being created. If you don't it's probably an issue with the umbraco version you are using.
Hi,
Using version 7.38 at the moment - limited to this due to Archetype support for now. Is there a known issue with it?
Cheers, Jamie
Can you confirm that this should only be set on the Slave? I followed the instructions from the docs and set it on both master and slave. We have experienced many problems related to indexing and multiple restarts on warm up, often resulting in outages of over 20 minutes. I was about to disable the scaling options until I stumbled across this post.
Also very much looking forward to having an easy to configure AzureDirectory.
Many thanks
Chris
Shannon,
I had:
So useTempStorage=sync and tempstorage directory was for local azure storage. Using Umbraco 7.3.8 so this should not be using folder like
I am confused?
Regards
Ismail
Shannon,
One thing in the issue tracker you mention:
So in my config for the searcher bit i should have:
as well?
So my whole examinesettings.config should look like:
Regards
Ismail
Just to chime in here, we're having identical issues to Ismail. Even though everything's set to use Sync, the Internal Indexer occasionally craps out looking for the segments file in the ASP.Net temp folder, rather than the user temp folder.
Tim,
I have updated my config so i am now doing rebuild indexes on restart. Have pushed up to my azure staging I am hoping that after the web worker switch which I am assuming is like a restart it will find the empty index and rebuild it?
Regards
Ismail
Tim, i feel like all of this information is just being lost or people are not finding it and/or people are simply not reading it. Please see: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/74731-examine-corruption-issues#comment-246649
Hi Shannon,
We had all of that set, except the rebuild on startup, have re-published and will keep an eye on it and see if the issue happens again.
If you upgrade to Examine 0.1.69-beta it's easier to use AzureDirectory like Shannon described here: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/74731-examine-corruption-issues#comment-244293
I've got it working and my Examine indexes are now stored in blob storage. More info about my setup in this topic: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine
Jeroen
How's the performance of the blob storage provider?
See this comment ;-) https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine#comment-251847
is working on a reply...