Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 21, 2016 @ 09:40
    Jeroen Breuer
    2

    Using AzureDirectory with Examine

    Hello,

    I'm trying to use AzureDirectory with Examine 0.1.69-beta. Based on this tweet I did the following:

    Created an AzureDirectoryFactory

    public class AzureDirectoryFactory : IDirectoryFactory
    {
        public Directory CreateDirectory(LuceneIndexer indexer, string luceneIndexFolder)
        {
            return new AzureDirectory(CloudStorageAccount.Parse(ConfigSettings.AzureConnectionString), ConfigSettings.AzureUmbracoExamineContainerName);
        }
    }
    

    I've updated the ExamineSettings.config file like this:

    <add name="ContentIndexer" 
          type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
          directoryFactory="Project.Web.Core.ExamineAzure.AzureDirectoryFactory, Project.Web.Core"/>
    

    After running the code I see the new container in the blob storage, but it's empty.

    In the error log I get the following exception:

    Provider=ContentIndexer, NodeId=-1
    System.Exception: An error occurred creating the index,Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. ---> System.Net.WebException: The remote server returned an error: (404) Not Found.
       at System.Net.HttpWebRequest.GetResponse()
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       --- End of inner exception stack trace ---
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       at Examine.Directory.AzureDirectory.AzureLock.IsLocked() in d:\data\inetpub\project\Sources\Project.Web.Core\ExamineAzure\AzureLock.cs:line 29
       at Examine.LuceneEngine.Providers.LuceneIndexer.CreateNewIndex(Directory dir) in X:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 700
    Request Information
    RequestID:80c08410-0001-0029-662c-e32123000000
    RequestDate:Thu, 21 Jul 2016 08:45:29 GMT
    StatusMessage:The specified blob does not exist.
    , IndexSet: ContentIndexSet
    

    In the code the error happens at:

    var blob = _azureDirectory.BlobContainer.GetBlobReferenceFromServer(_lockFile);
    

    It's here on github: https://github.com/Shazwazza/Examine/blob/master/Examine.Directory.AzureDirectory/AzureLock.cs#L29

    Do I need to manually need to create the blob or is there something else I'm doing wrong?

    Thanks for your help.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 21, 2016 @ 11:25
    Jeroen Breuer
    102

    This was a bug in the AzureDirectory project. It has been fixed there: https://github.com/azure-contrib/AzureDirectory/issues/20

    I've also updated my own version of the AzureDirectory which is a copy of Shannon his code: https://github.com/Shazwazza/Examine/blob/master/Examine.Directory.AzureDirectory/AzureDirectory.cs

    I've created a pull request to also fix it in there: https://github.com/Shazwazza/Examine/pull/49

    Now my Examine indexes are stored in Azure blob storage :-).

    Jeroen

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jul 21, 2016 @ 11:33
    Ismail Mayat
    0

    Jeroen,

    How big is your index? What is performance like?

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 21, 2016 @ 11:53
    Shannon Deminick
    1

    It reads locally, it writes to both, performance will only be affected on writes, but writes are buffered.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 21, 2016 @ 12:17
    Jeroen Breuer
    0

    Hmm suddenly I'm getting the error in the logs:

    Provider=InternalIndexer, NodeId=-1
    System.Exception: Cannot index queue items, the index is currently locked,, IndexSet: InternalIndexSet
    

    Also somehow the Examine Management dashboard won't load anymore. I get this exception:

    Failed to retrieve indexer details
    
    The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
    
    EXCEPTION DETAILS:
    
    System.InvalidOperationException: The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
    

    I used to work. Didn't really change anything.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 21, 2016 @ 14:15
    Jeroen Breuer
    0

    The reason I got these lock exceptions was because of my local storage. I didn't configure it so multiple indexes used the same folder. Here is my AzureDirectoryFactory in which this is fixed:

    public class AzureDirectoryFactory : IDirectoryFactory
    {
        public Directory CreateDirectory(LuceneIndexer indexer, string luceneIndexFolder)
        {
            return new AzureDirectory(
                CloudStorageAccount.Parse(ConfigSettings.AzureConnectionString), 
                ConfigSettings.AzureUmbracoExamineContainerName,
                GetLocalStorageDirectory(luceneIndexFolder),
                rootFolder: indexer.IndexSetName);
        }
    
        private static Directory GetLocalStorageDirectory(string luceneIndexFolder)
        {
            var catalogDir = new DirectoryInfo(luceneIndexFolder);
            if (!catalogDir.Exists)
            {
                catalogDir.Create();
            }
    
            return FSDirectory.Open(catalogDir);
        }
    }
    

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 21, 2016 @ 16:15
    Jeroen Breuer
    3

    The above code worked local, but not on a load balance setup in Azure. The following code does work and should be the final version:

    public class AzureDirectoryFactory : IDirectoryFactory
    {
        public Directory CreateDirectory(LuceneIndexer indexer, string luceneIndexFolder)
        {
            return new AzureDirectory(
                CloudStorageAccount.Parse(ConfigSettings.AzureConnectionString), 
                ConfigSettings.AzureUmbracoExamineContainerName,
                GetLocalStorageDirectory(indexer.IndexSetName),
                rootFolder: indexer.IndexSetName);
        }
    
        private static Directory GetLocalStorageDirectory(string name)
        {
            var appDomainHash = ToMd5(HttpRuntime.AppDomainAppId);
            var cachePath = Path.Combine(
                Environment.ExpandEnvironmentVariables("%temp%"), 
                "LuceneDir",
                appDomainHash, 
                "App_Data", 
                "TEMP", 
                "ExamineIndexes", 
                name);
            var azureDir = new DirectoryInfo(cachePath);
            if (azureDir.Exists == false)
            {
                azureDir.Create();
            }
    
            return new SimpleFSDirectory(azureDir);
        }
    
        private static string ToMd5(string stringToConvert)
        {
            // create an instance of the MD5CryptoServiceProvider
            var md5Provider = new MD5CryptoServiceProvider();
    
            // convert our string into byte array
            var byteArray = Encoding.UTF8.GetBytes(stringToConvert);
    
            // get the hashed values created by our MD5CryptoServiceProvider
            var hashedByteArray = md5Provider.ComputeHash(byteArray);
    
            // create a StringBuilder object
            var stringBuilder = new StringBuilder();
    
            // loop to each each byte
            foreach (var b in hashedByteArray)
            {
                // append it to our StringBuilder
                stringBuilder.Append(b.ToString("x2").ToLower());
            }
    
            // return the hashed value
            return stringBuilder.ToString();
        }
    }
    

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 22, 2016 @ 08:23
    Shannon Deminick
    0

    Thanks mate!! I'll of course need to add all this stuff to the docs when i get around to it. Otherwise, it would be really really amazing if you could create/update the docs on the GitHub wiki:

    https://github.com/shazwazza/examine/wiki

    The codebase also should have an instance of AzureDirectoryFactory built in to make this easier for folks to just use, as always a PR would be super wonderful :)

    Thanks for testing this out, much appreciated!

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 22, 2016 @ 09:03
    Jeroen Breuer
    0

    Thanks. I'll try to write some docs and create a PR when I have time. Do you have any feedback on the code? Not sure if I've implemented everything the correct way.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 25, 2016 @ 07:29
    Jeroen Breuer
    0

    On the master server I get the follow error occasionally:

    System.Exception: Cannot index queue items, the index is currently locked,, IndexSet: InternalIndexSet
    

    Not sure if this is a big problem. The indexes still seem to be up to date. The error happens here in the code: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1452

    Somehow in the logs it happens every day for all indexes at 05:21.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 25, 2016 @ 08:28
    Shannon Deminick
    0

    I'll have to investigate that - it should definitely not happen. Not sure why it would do this on a schedule.

    Examine keeps an IndexWriter open for the duration of the app for performance reasons and when the app is shutdown or restarted it shuts down the writer and removing the directory lock. I wonder if there's an issue with removing the lock file from the blob storage area, or maybe it's a timing issue too.

    I haven't set all of this up myself, hopefully I'll find some time soon.

    Is there any way you can replicate this issue, or only on this strange timer?

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 25, 2016 @ 08:40
    Jeroen Breuer
    0

    I've also seen it at different times for a different indexer:

    System.Exception: Cannot index queue items, the index is currently locked,, IndexSet: InternalMemberIndexSet
    

    I'm not sure if it's the lock file from the blob storage. We've also got the local cache directory that I setup in the AzureDirectoryFactory. Not sure what type of Directory the GetLocalStorageDirectory should return. For now it's using SimpleFSDirectory because that is what I saw in EnvironmentTempLocationDirectoryFactory in the Examine source code.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 11:47
    Jeroen Breuer
    0

    This error also happens frequently after updating our QA through CI:

    System.ApplicationException: Could not create an index searcher with the supplied lucene directory ---> System.IO.FileNotFoundException: segments_3 ---> Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. ---> System.Net.WebException: The remote server returned an error: (404) Not Found.
       at System.Net.HttpWebRequest.GetResponse()
       at System.Net.HttpWebRequest.GetResponse()
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       --- End of inner exception stack trace ---
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       at Examine.Directory.AzureDirectory.AzureDirectory.OpenInput(String name) in d:\w\3\4d52359c6a32a67a\Sources\Project.Web.Core\ExamineAzure\AzureDirectory.cs:line 253
       --- End of inner exception stack trace ---
       at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
       at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPolicy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexDivisor)
       at UmbracoExamine.UmbracoExamineSearcher.OpenNewReader()
       at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in X:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 288
       --- End of inner exception stack trace ---
       at Umbraco.Core.Cache.HttpRuntimeCacheProvider.GetCacheItem(String cacheKey, Func`1 getCacheItem, Nullable`1 timeout, Boolean isSliding, CacheItemPriority priority, CacheItemRemovedCallback removedCallback, CacheDependency dependency)
       at Umbraco.Core.Cache.HttpRuntimeCacheProvider.GetCacheItem(String cacheKey, Func`1 getCacheItem, Nullable`1 timeout, Boolean isSliding, CacheItemPriority priority, CacheItemRemovedCallback removedCallback, String[] dependentFiles)
       at Umbraco.Core.Cache.DeepCloneRuntimeCacheProvider.GetCacheItem(String cacheKey, Func`1 getCacheItem, Nullable`1 timeout, Boolean isSliding, CacheItemPriority priority, CacheItemRemovedCallback removedCallback, String[] dependentFiles)
       at Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache.GetCacheValues(Int32 id, Func`2 func)
       at Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache.GetUmbracoMedia(Int32 id)
       at Umbraco.Web.PublishedCache.ContextualPublishedCache`1.GetById(Boolean preview, Int32 contentId)
       at Project.Web.Core.PropertyConverters.MultipleMediaPickerConverter.ConvertSourceToObject(PublishedPropertyType propertyType, Object source, Boolean preview) in d:\w\3\4d52359c6a32a67a\Sources\Project.Web.Core\PropertyConverters\MultipleMediaPickerConverter.cs:line 80
       at Umbraco.Web.PublishedCache.XmlPublishedCache.XmlPublishedProperty.<.ctor>b__1()
       at System.Lazy`1.CreateValue()
       at System.Lazy`1.LazyInitValue()
       at Umbraco.Web.PublishedPropertyExtension.GetValue[T](IPublishedProperty property, Boolean withDefaultValue, T defaultValue)
       at Project.Models.ContentModels.Home.get_HeaderImage() in d:\w\3\4d52359c6a32a67a\Sources\Project.Models\ContentModels\Home.generated.cs:line 66
       at Project.Web.Core.Mappings.TypeConverters.HomePage.HomePageToHeroMoleculeConvertor.Convert(ResolutionContext context) in d:\w\3\4d52359c6a32a67a\Sources\Project.Web.Core\Mappings\TypeConverters\HomePage\HomePageToHeroMoleculeConvertor.cs:line 67
       at AutoMapper.MappingExpression`2.<>c__DisplayClass15.<ConvertUsing>b__14(ResolutionContext context)
       at AutoMapper.Mappers.TypeMapMapper.Map(ResolutionContext context, IMappingEngineRunner mapper)
       at AutoMapper.MappingEngine.AutoMapper.IMappingEngineRunner.Map(ResolutionContext context)
       --- End of inner exception stack trace ---
       at AutoMapper.MappingEngine.AutoMapper.IMappingEngineRunner.Map(ResolutionContext context)
       at AutoMapper.MappingEngine.Map[TDestination](Object source, Action`1 opts)
       at Project.Web.Core.Controllers.HomeController.Hero() in d:\w\3\4d52359c6a32a67a\Sources\Project.Web.Core\Controllers\HomeController.cs:line 122
       at lambda_method(Closure , ControllerBase , Object[] )
    

    After clearing the Examine indexes in blob and restarting the website it works again.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 27, 2016 @ 12:04
    Shannon Deminick
    1

    I've fixed that now, there's a beta2 of examine you can test: https://www.nuget.org/packages/Examine/0.1.69-beta2

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 12:14
    Jeroen Breuer
    0

    Thanks I will try it :-). Is that the same issue as I've created here? https://github.com/azure-contrib/AzureDirectory/issues/22

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 12:33
    Jeroen Breuer
    0

    After the upgrade I still got the error. So I removed the blob container again so it would rebuild and now it seems to be working. Not sure if that was required.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 13:33
    Jeroen Breuer
    0

    I upgraded to beta2 and moved it to our QA. In the backoffice I rebuild all the Examine indexes and they are available in blob storage.

    However in the log I see the following the whole time:

    WARN  Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 4642 from Examine index, reverting to looking up media via legacy library.GetMedia method
    

    This only seems to happen with media. Search with Examine does work. The website seems slower than before.

    Did I configure something wrong in my AzureDirectoryFactory? https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine#comment-251869

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 13:53
    Jeroen Breuer
    0

    The issue somehow seems to be with only my InternalIndexer. If I click Optimize index also nothing happens. For the other indexes Optimized? says true, but this one always says false.

    Our QA has a master where we manage Umbraco and a separate frontend.

    Any idea what it could be?

    enter image description here

    enter image description here

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 27, 2016 @ 15:46
    Jeroen Breuer
    0

    The QA frontend server is a WebApp. I added some extra logging. These are the Examine index LocalStorageDirectory paths according to the logs:

    D:\local\Temp\LuceneDir\1b813f5e762be79d3fb91e94aae44db1\App_Data\TEMP\ExamineIndexes\InternalIndexSet
    D:\local\Temp\LuceneDir\1b813f5e762be79d3fb91e94aae44db1\App_Data\TEMP\ExamineIndexes\InternalMemberIndexSet
    D:\local\Temp\LuceneDir\1b813f5e762be79d3fb91e94aae44db1\App_Data\TEMP\ExamineIndexes\ExternalIndexSet
    D:\local\Temp\LuceneDir\1b813f5e762be79d3fb91e94aae44db1\App_Data\TEMP\ExamineIndexes\ContentIndexSet
    D:\local\Temp\LuceneDir\1b813f5e762be79d3fb91e94aae44db1\App_Data\TEMP\ExamineIndexes\VortoContentIndexSet
    

    We use Examine to get ids from the indexes. Here is the code:

    private IEnumerable<int> GetNodeIdsWithAliasUsingExamine(List<string> nodeTypeAliases)
    {
        using (ApplicationContext.Current.ProfilingLogger.TraceDuration<ContentRepository>(
                        "Start get node id's from Examine",
                        "Completed get node id's from Examine"))
        {
            var examineprovider = ExamineManager.Instance.SearchProviderCollection[ConfigKeys.ExternalSearcher];
            if (examineprovider != null)
            {
                var criteria = examineprovider.CreateSearchCriteria();
                if (criteria != null)
                {
                    var fields = new[] { UmbracoContentIndexer.NodeTypeAliasFieldName };
                    var operation = criteria.GroupedOr(fields, nodeTypeAliases.ToArray()).Compile();
                    var searchResults = examineprovider.Search(operation);
    
                    var ids = searchResults.Select(x => x.Id).ToList();
    
                    return ids;
                }
            }
    
            return null;
        }
    }
    

    As you can see we use TraceDuration to see how long this takes. Here are some results from the logging:

    Start get node id's from Examine
    Completed get node id's from Examine (took 365ms)
    
    Start get node id's from Examine
    Completed get node id's from Examine (took 295ms)
    

    For this simple code it seems a bit long. Could it be related to the LocalStorageDirectory paths?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 28, 2016 @ 07:01
    Shannon Deminick
    0

    It shouldn't be slow at all of course. I wish i had time to help you look into this but unfortunately I'm swamped at the moment :( Really happy that you are giving this a spin though, i'll try to get my site setup with the local storage sync on azure and see how i go soon.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 07:14
    Jeroen Breuer
    0

    I'm doing some more investigations. I also upgrade from Umbraco 7.3.8 to 7.4.3 and also upgraded uSync from 3.0.3 to 3.1.4.740 and uSync.Core from 5.1.0 to 5.3.5.740.

    So maybe it's not related to the Examine 0.1.69-beta2 upgrade, but that was the first thing that came to mind when looking at the logs.

    Will keep you updated.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 09:01
    Jeroen Breuer
    0

    I can confirm that it's related to AzureDirectory. I can also reproduce it on my local dev. Here are the results if I use the AzureDirectory:

      <add name="InternalIndexer" 
           type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
           supportUnpublished="true"
           supportProtected="true"
           analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"
           directoryFactory="NewHeroes.Web.Core.ExamineAzure.AzureDirectoryFactory, NewHeroes.Web.Core"/>
    
     2016-07-28 10:41:00,785 [P10960/D3/T49] INFO  NewHeroes.Web.Core.Controllers.MasterController - Start user nav custom items action
     2016-07-28 10:41:00,785 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all feedback overview items
     2016-07-28 10:41:00,786 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:41:01,008 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 9860. For nodeTypeAliases: feedbackoverview
     2016-07-28 10:41:01,008 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 222ms)
     2016-07-28 10:41:01,009 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get all feedback overview items (took 222ms)
     2016-07-28 10:41:01,031 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all company details items
     2016-07-28 10:41:01,031 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:41:01,367 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 7832. For nodeTypeAliases: companydetails
     2016-07-28 10:41:01,367 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 336ms)
     2016-07-28 10:41:01,368 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get company details items (took 336ms)
     2016-07-28 10:41:01,390 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all LicenseDetails items
     2016-07-28 10:41:01,390 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:41:01,698 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 7831. For nodeTypeAliases: licensedetails
     2016-07-28 10:41:01,698 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 308ms)
     2016-07-28 10:41:01,698 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get all LicenseDetails items (took 308ms)
     2016-07-28 10:41:01,721 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all usage report items
     2016-07-28 10:41:01,721 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:41:02,068 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 9973. For nodeTypeAliases: usagereport
     2016-07-28 10:41:02,068 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 348ms)
     2016-07-28 10:41:02,069 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get usage report items (took 348ms)
     2016-07-28 10:41:02,091 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all logout items
     2016-07-28 10:41:02,091 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:41:02,407 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 3302. For nodeTypeAliases: logout
     2016-07-28 10:41:02,407 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 315ms)
     2016-07-28 10:41:02,407 [P10960/D3/T49] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get logout overview items (took 315ms)
     2016-07-28 10:41:02,427 [P10960/D3/T49] INFO  NewHeroes.Web.Core.Controllers.MasterController - Completed user nav custom items action (took 1641ms)
    

    And here are the results if AzureDirectory is not used and we use Examine the regular way.

    <add name="InternalIndexer" 
               type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
     2016-07-28 10:53:06,069 [P8420/D3/T35] INFO  NewHeroes.Web.Core.Controllers.MasterController - Start user nav custom items action
     2016-07-28 10:53:06,069 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all feedback overview items
     2016-07-28 10:53:06,069 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:53:06,070 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 9860. For nodeTypeAliases: feedbackoverview
     2016-07-28 10:53:06,070 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 1ms)
     2016-07-28 10:53:06,071 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get all feedback overview items (took 2ms)
     2016-07-28 10:53:06,102 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all company details items
     2016-07-28 10:53:06,102 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:53:06,104 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 7832. For nodeTypeAliases: companydetails
     2016-07-28 10:53:06,104 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 2ms)
     2016-07-28 10:53:06,104 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get company details items (took 2ms)
     2016-07-28 10:53:06,127 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all LicenseDetails items
     2016-07-28 10:53:06,127 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:53:06,129 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 7831. For nodeTypeAliases: licensedetails
     2016-07-28 10:53:06,129 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 1ms)
     2016-07-28 10:53:06,129 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get all LicenseDetails items (took 2ms)
     2016-07-28 10:53:06,157 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all usage report items
     2016-07-28 10:53:06,157 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:53:06,159 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 9973. For nodeTypeAliases: usagereport
     2016-07-28 10:53:06,159 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 1ms)
     2016-07-28 10:53:06,159 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get usage report items (took 2ms)
     2016-07-28 10:53:06,184 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get all logout items
     2016-07-28 10:53:06,184 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Start get node id's from Examine
     2016-07-28 10:53:06,185 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Ids found in Examine: 3302. For nodeTypeAliases: logout
     2016-07-28 10:53:06,185 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get node id's from Examine (took 1ms)
     2016-07-28 10:53:06,186 [P8420/D3/T35] INFO  NewHeroes.Business.Logic.ContentRepository - Completed get logout overview items (took 2ms)
     2016-07-28 10:53:06,253 [P8420/D3/T35] INFO  NewHeroes.Web.Core.Controllers.MasterController - Completed user nav custom items action (took 184ms)
    

    As you can see it's a big difference. I wonder if I do something wrong with my LocalStorageDirectory.

    I downgraded from beta2 to beta1, but that didn't make any difference.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 28, 2016 @ 09:39
    Shannon Deminick
    0

    Interesting, is that only the first read ? or all reads?

    The reading should be only done from the local disk, the files there are synced from blob storage so are not read directly from blob storage. However, if the files haven't been synced locally yet, then they need to be lazily transferred from blob storage to the local file system. So the first read if it's a new site, will be slow because of the lazy file syncing.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 09:41
    Jeroen Breuer
    0

    It happens on all reads.

    The files are available on the local disk. I will check if maybe somehow it still reads them from the blob storage.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 10:10
    Jeroen Breuer
    0

    Did some debugging, but I'm no expert at Examine and AzureDirectory. I would like to know if the files are read from the blob storage or the local directory. Any hints on where to look? I've got the AzureDirectory and related files in my own project so I can debug them or add extra logging.

    Any other tips where I can look for performance issues? Some places I could add extra logging?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 28, 2016 @ 11:04
    Shannon Deminick
    0

    That's what I just described :)

    Files are synced from blob storage to the local file system and read from there. If the file already exists on the local file system, it just gets used and blob storage is not accessed.

    You could look in https://github.com/Shazwazza/Examine/blob/master/Examine.Directory.AzureDirectory/AzureIndexInput.cs

    The CacheDirectory should be a local file system path and when lucene files are accessed they should be read from there.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 11:51
    Jeroen Breuer
    0

    I checked it and the CacheDirectory is a local file system path. So it seems lucene is reading the files from there.

    Maybe it is just slow ;-) https://twitter.com/richorama/status/758618845665976320

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jul 28, 2016 @ 11:56
    Shannon Deminick
    0

    No something is not right, if the lucene files are being read from a local disk there's no bottleneck. If i can reproduce when i have time we can figure out where that bottleneck is. Maybe something in the logic in there is firing too much, would need to step through the code or add some timing benchmarks to see where the problem lies.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 11:59
    Jeroen Breuer
    0

    Ok I'll wait until you can reproduce it. For now I'll disable AzureDirectory. Let me know if I can help with anything.

    Jeroen

  • Jeavon Leopold 3072 posts 13628 karma points MVP 10x admin c-trib
    Jul 28, 2016 @ 19:33
    Jeavon Leopold
    1

    I've got it working well too and I can optimize my InternalIndexer but I only have 20 documents so I will try with a much bigger site and see how it works. This is really great stuff though!

    Jeroen, what query were you doing for your bench marking and how many nodes, could you share the code for that?

  • Jeavon Leopold 3072 posts 13628 karma points MVP 10x admin c-trib
    Jul 28, 2016 @ 19:35
    Jeavon Leopold
    0

    Jeroen, I see you posted the code earlier without the extra logging, do you have a update?

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 19:50
    Jeroen Breuer
    0

    Hi Jeavon,

    Cool that you are also trying to use AzureDirectory. The code I'm using to search Examine is the code I posted here:

    GetNodeIdsWithAliasUsingExamine: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine#comment-252266

    It searches the ExternalSearcher and has over 9000 items with 250 fields in it. It's for the New Heroes project.

    It was so slow with AzureDirectory that I had to disable it for now.

    Did you do the same steps as I did in this topic to set it up? I really want this to work :-).

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 19:59
    Jeroen Breuer
    0

    Here is the full code I did with the extra logging. It's only a small change.

    private IEnumerable<int> GetNodeIdsWithAliasUsingExamine(List<string> nodeTypeAliases)
    {
        using (ApplicationContext.Current.ProfilingLogger.TraceDuration<ContentRepository>(
                        "Start get node id's from Examine",
                        "Completed get node id's from Examine"))
        {
            var examineprovider = ExamineManager.Instance.SearchProviderCollection[ConfigKeys.ExternalSearcher];
            if (examineprovider != null)
            {
                var criteria = examineprovider.CreateSearchCriteria();
                if (criteria != null)
                {
                    var fields = new[] { UmbracoContentIndexer.NodeTypeAliasFieldName };
                    var operation = criteria.GroupedOr(fields, nodeTypeAliases.ToArray()).Compile();
                    var searchResults = examineprovider.Search(operation);
    
                    var ids = searchResults.Select(x => x.Id).ToList();
                    LogHelper.Info<ContentRepository>(string.Format("Ids found in Examine: {0}. For nodeTypeAliases: {1}", string.Join(",", ids), string.Join(",", nodeTypeAliases)));
    
                    return ids;
                }
            }
    
            return null;
        }
    }
    

    Jeroen

  • Jeavon Leopold 3072 posts 13628 karma points MVP 10x admin c-trib
    Jul 28, 2016 @ 19:57
    Jeavon Leopold
    1

    I have been focused on ensuring that the cold boot 0 documents in index issue is a thing of the past with Azure Directory as that's the issue that has plagued me I hadn't looked at performance.

    It looks like your query is taking 3 seconds apposed to 200 milliseconds which is a lot and doesn't make much sense as the indexes are on disk the second time around.

    I'll fire this into a project with thousands of documents and try it with your test code and let you know how it performs.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 28, 2016 @ 20:05
    Jeroen Breuer
    0

    Thanks for helping Jeavon. Somehow I still think it reads the files from blob directly instead of the local files. It's the only thing which would explain the big difference in performance. But I can't find where in the code this would happen.

    The cold boot 0 documents issue is also one of the things I would like this to work :-).

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 01, 2016 @ 10:05
    Jeroen Breuer
    0

    Hi Jeavon,

    Did you already try AzureDirectory on a larger site?

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 16, 2016 @ 08:29
    Jeroen Breuer
    0

    I tried switching the local storage to RAMDirectory, but the performance is still very slow. So it's not the location of the local files.

    public Directory CreateDirectory(LuceneIndexer indexer, string luceneIndexFolder)
    {
        return new AzureDirectory(
            CloudStorageAccount.Parse(ConfigSettings.AzureConnectionString), 
            ConfigSettings.AzureUmbracoExamineContainerName,
            new RAMDirectory(),
            rootFolder: indexer.IndexSetName);
    }
    

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Aug 16, 2016 @ 10:15
    Shannon Deminick
    1

    Thanks for the update! I'm on holidays for a couple of weeks after this week so I wont have any time to investigate this before then. I might suggest either attaching break points in the code to discover where the bottleneck is or add some profiling logging. Perhaps the issue is that the logic is somehow reversed and it's not reading correctly from only the local files, or while reading from the local files it is still requesting an unecessary status from Blob storage.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 16, 2016 @ 10:57
    Jeroen Breuer
    0

    I tried attaching break points and profiling logging, but I couldn't find the bottleneck. I already spend quite a lot of time on this and I don't really have time left to investigate. So I guess I'll have to wait for now until you're back from your holiday. Enjoy!

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Sep 26, 2016 @ 10:07
    Jeroen Breuer
    0

    Hi Shannon and Jeavon,

    Did any of you tried Azure Directory? Did you also have performance issues?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Sep 26, 2016 @ 10:40
    Shannon Deminick
    0

    Sorry :( I really haven't had any time to try to diagnose, i really want to but seem to have too many other urgent tasks to complete/fix.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Sep 26, 2016 @ 12:00
    Jeroen Breuer
    0

    Hi Shannon,

    Thanks for the reply. I understand you're busy :-). Would you have time to check it if I can send you a complete solution with test Azure blob storage to reproduce it?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Sep 26, 2016 @ 12:20
    Shannon Deminick
    0

    That would be wonderful, you've got my email i think :)

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Oct 04, 2016 @ 09:48
    Jeroen Breuer
    0

    Hi Shannon,

    I've just send you an email with a complete solution with test Azure blob storage to reproduce it. Thought I would also post parts of the email here so other people can also try it.

    I've created a branch of the 1-1 multilingual example which now also has a search function: https://github.com/jbreuer/1-1-multilingual-example/tree/AzureDirectory

    Steps to reproduce:

    1. Unzip the attachment
    2. Open \Sources\24days\24days.sln and build the solution
    3. Configure the blob storage in the web.config
    4. Run the project from VS
    5. Login into the Umbraco backoffice: localhost:55988/umbraco/. Username: admin - Password: testtest
    6. Go to the Examine dashboard and rebuild the indexes. This should create the blob storage container for you and the local index files.
    7. Go to the search page with trace enabled: http://localhost:55988/nl/zoeken/?q=hoofdkwartier&umbDebugShowTrace=true
    8. You can see that the search action takes about 200ms. If you remove the directoryFactory from the index providers in ExamineSettings.config and rebuild the examine indexes you can see it will than take about 10 ms.
    9. All AzureDirectory classes are in \Sources\Umbraco.Extensions\ExamineAzure if you want to debug anything.

    Search in my solution is slow with almost no content. So it's probably not related to the amount of content. Jeavon said he didn't have performance issues so maybe it's something I configured wrong in my solution.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Oct 11, 2016 @ 13:04
    Jeroen Breuer
    0

    I've contacted Richard Astbury who is maintaining the AzureDirectory project on Github. He also downloaded 1-1 multilingual example branch with AzureDirectory and is able to reproduce the issue.

    It's still unclear how to fix the performance issues, but here are some interesting findings (parts of a Skype chat):

    • What it's doing is a HEAD request for each blob, to see if it's changed. It detects that the files haven't changed, and then loads them out of the cache. My lucene knowledge isn't that great (I didn't write this library!) but I think that some files are immutable. So perhaps we could always go to the cache for some files. We could also set a cache expiry for the mutable ones. Apparently the segment files are immutable. I thought there was a file which lists all the segments, but I'm not sure. Maybe it just scans the directory (container).
    • When I run the unit tests it doesn't make the HEAD requests to storage for every file. So my guess is that this is a lucene configuration thing.
    • Ah - I think I've found something interesting. If you hold on to an instance of IndexSearcher, the cache seems to work correctly. If you create a new IndexSearcher for every search, it does the head requests. But it seems that IndexSearcher is a singleton.

    So we still haven't found a solution. Shannon maybe the above findings are helpful for you?

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Oct 14, 2016 @ 07:42
    Jeroen Breuer
    0

    Besides the performance issues it seems that the Examine and AzureDirectory combination is pretty stable. I only get this error in my local log every once in a while:

    2016-10-03 15:35:54,690 [P8280/D2/T7] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=InternalIndexer, NodeId=-1
    System.Exception: Cannot index queue items, the index is currently locked,, IndexSet: InternalIndexSet
    

    Not sure if that's a problem.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Oct 14, 2016 @ 07:43
    Shannon Deminick
    0

    yeah, that's a problem ;)

  • Richard Astbury 2 posts 72 karma points
    Oct 28, 2016 @ 13:37
    Richard Astbury
    1

    Most of the calls to blob storage seem to be triggered by this line in examine:

    https://github.com/Shazwazza/Examine/blob/b5a17b34981ee41067da287232eb6a646bae3193/Projects/Examine/LuceneEngine/LuceneExtensions.cs#L29

        [SecuritySafeCritical]
        public static ReaderStatus GetReaderStatus(this IndexReader reader)
        {
            ReaderStatus status = ReaderStatus.NotCurrent;
            try
            {
                status = reader.IsCurrent() ? ReaderStatus.Current : ReaderStatus.NotCurrent; // <--- this line
            }
            catch (AlreadyClosedException)
            {
                status = ReaderStatus.Closed;
            }
            return status;
        }
    

    I'm wondering (I don't know) if this should be on a background thread, and called periodically?

    I have tried changing this method, so it always returns ReaderStatus.Current which reduces the number of calls.

    However, one call remains, which is to list the blobs (AzureDirectory.List()).

    Which is called (indirectly) by IndexReader.IndexExists() here:

    https://github.com/Shazwazza/Examine/blob/b5a17b34981ee41067da287232eb6a646bae3193/Projects/Examine/LuceneEngine/Providers/LuceneSearcher.cs#L277

        [SecuritySafeCritical]
        private bool ValidateSearcher(bool forceReopen)
        {
            if (!IndexReader.IndexExists(GetLuceneDirectory())) return false; // <-- this line
    
            if (!forceReopen)
            {
                if (_reader == null)
                ...
    

    I wonder whether this result could be cached, as it's an unlikely scenario that an index is deleted while the site is running (again, an assumption).

    Commenting out this line stops AzureDirectory from making any calls to blob storage for a search.

    I hope this helps. I'm afraid I'm not an expert on Lucene!

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Dec 07, 2016 @ 16:08
    Darren Ferguson
    0

    @Richard as there are dual writes to the cache directory and the Azure Directory would it not make sense to change the check on the reader status to look at the cache directory - which presumably is quicker?

    I'm just guessing :)

  • Richard Astbury 2 posts 72 karma points
    Dec 07, 2016 @ 16:31
    Richard Astbury
    0

    It might be necessary to read from the azure blobs directly when there is more than one machine, and the index is being written by another machine.

    I don't know how umbraco handles clustering and index writes.

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Dec 07, 2016 @ 16:40
    Darren Ferguson
    0

    A publish would happen on the "master" CMS server and then typically each "slave" would receive a notification to update it's cache and indexes.

    I need to look at the code a bit further but I assume it is fine that the check happens against the cache directory of the Azure reader somehow.

    I need a break for a bit first though - this is quite full on :)

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Dec 08, 2016 @ 11:13
    Darren Ferguson
    0

    I'm kind of blocked as the fork of AzureDirectory by Richard here:

    https://github.com/azure-contrib/AzureDirectory

    uses Lucene 3 whereas Examine uses 2.9.4.1

    I guess that the project here https://azuredirectory.codeplex.com/ is still Lucene 2.9.4.1 but doesn't contain some of the other fixes and improvements that Richard has made.

    My rough idea would be to make Examine aware of the Azure Directory object and do any checks on the reader status against the cache directory.

    I tried just commenting/removing the offending code yesterday and it does take queries down from 300ms on a small collection of content to 1ms, so I think it is fair to say that this is the bottleneck.

    But I could do with some input from Shannon because I think making Examine aware of the AzureDirectory kind of defeats the purpose of supporting the factory pattern in the first place.

    In the short term, one could just override the default Lucene Searcher and do some trickery over there. I'm not sure how much benefit has been added in the latest versions of Lucene and AzureDirectory (that doesn't ship with Umbraco) in the course of this thread.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Dec 08, 2016 @ 12:32
    Jeroen Breuer
    0

    Hi Darren,

    Did you check the AzureDirectory version that is in Examine? https://github.com/Shazwazza/Examine/tree/master/Examine.Directory.AzureDirectory

    The commit description is: Adds AzureDirectory backported to work with Lucene 2.9 with the sync changes made and a MutexManagers that is per-directory instance.

    It's also used here: https://github.com/jbreuer/1-1-multilingual-example/tree/AzureDirectory/Sources/Umbraco.Extensions/ExamineAzure

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Dec 16, 2016 @ 08:24
    Jeroen Breuer
    0

    I blogged about how you can have Examine indexes in Azure Blob Storage. It also has a working example: http://24days.in/umbraco-cms/2016/umbraco-edge-case-stories/#examine

    It still has the same performance problems.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 02, 2017 @ 15:22
    Jeroen Breuer
    0

    Hello,

    Does anyone have a status update for this thread? Has someone succeeded in improving the performance when using AzureDirectory?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 10, 2017 @ 05:29
    Shannon Deminick
    3

    Hi all,

    I've been working on the Examine.Directory.Sync project - which is very similar to the AzureDirectory stuff and have discovered much of the same things (and much more) as @Richard.

    I've also logged this issue on the AzureDirectory project: https://github.com/azure-contrib/AzureDirectory/issues/23 based on some "interesting" lucene discoveries. Lucene is fairly odd the way that it works in many cases so getting things right isn't always how it seems.

    As @Richard mentions, there are many ways in which Directory.ListAll() or Directory.List() is executed. Everytime this happens it lists all files in the folder. When it comes to Blob storage, this would of course start getting very slow. This is executed anytime a call to:

    • IndexReader.IndexExists() is called
    • IndexReader.IsCurrent() is called

    Both of these methods are executed very often which is probably a very large factor of why it is slow. Examine calls these methods when:

    • IndexReader.IndexExists() whenever a search is performed and whenever an index writer is created (once per index per app)
    • IndexReader.IsCurrent() is called whenever a search is performed to check if the current index searcher stale so that it can be re-opened if there's been new index operations done

    Until now, I naively thought that the performance of IndexExists() and IsCurrent() would be very fast since many Lucene engines call these quite a bit for all sorts of reasons, as it turns out however this is not true. I would suspect that Lucene 4.x (which is coming out later this year) probably has dealt with these things in a better way. In any case, I can look into updating Examine to deal with this in a much better way.

    Firstly, I will change all calls to IndexReader.IndexExists() in the index searcher, to only check this once if it actually does exist. Then I need to change how the reader is refreshed. In the 4.x version of Lucene.Net a SearchManager/NrtManager does all of this for us but since we're stuck with Lucene.Net 2.9 and 3.x we don't have this luxury like Java users do. In Examine 2.0-beta these classes have been back ported but I'm still not confident in them and I'm not sure if they'll survive that release. In the latest commits of Examine 0.80.0-beta we already have a large performance improvement where a Commit will only happen on a sliding time schedule, see: https://github.com/Shazwazza/Examine/issues/62 I can use this same code to have a fast sliding time schedule to check if the reader needs to be re-opened. This is basically what Lucene's SearchManager does! In 0.80.0 we also utilize NRT readers which are much faster to open.

    So I think getting the last things done in 0.80.0 will make this work well, stay tuned and I'll let you know when it's ready.

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 10, 2017 @ 16:26
    Shannon Deminick
    2

    @jeroen,

    I haven't had a chance to test your solution but I've done a ton of work to (hopefully) make this run fast with blob storage. I've pushed some Nuget packages for you to test, install (or update) in this order:

    Install-Package Examine -Pre -Version 0.1.80-beta02
    Install-Package Examine.Directory.Sync -Pre -Version 1.0.0-beta02
    Install-Package Examine.Directory.AzureDirectory -Pre -Version 1.0.0-beta02
    

    I've updated the zure directory stuff to use the latest WindowsAzure.Storage package too.

    I've changed quite a lot regarding reading files and IO processing. I'll also make another PR to the original AzureDirectory repo if all of this works as planned. I've tested the SyncDirectory which is essentially the same thing but instead of storing the master index in blob, it just stores the master index in a folder (i.e. on Azure websites it would be on a remote file share)

    Please let me know how it goes if you get a chance to test, otherwise i'll see if i can find time tomorrow.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 10, 2017 @ 20:51
    Jeroen Breuer
    0

    I couldn't wait so I tested this immediately :-).

    I've upgraded the 1-1 multilingual example AzureDirectory branch to the latest NuGet packages: https://github.com/jbreuer/1-1-multilingual-example/commits/AzureDirectory

    After this I did all the steps to reproduce the performance problem. Now performance is perfect! Sometimes a little spike to 20ms, but most of the time it's so fast it doesn't even show up on the trace log.

    Thank you for your hard work Shannon. I'm really happy you found some time to help us.

    When I try to reindex the content I sometimes get the following error:

    Cannot delete C:\Users\jnb\AppData\Local\Temp\LuceneDir\005b309ffbac5a50adddf56d6cd63a9b\App_Data\TEMP\ExamineIndexes\ExternalIndexSet\_0.cfs
    

    So I probably need to tweak my AzureDirectoryFactory. Maybe it's also possible to ship that with the Examine.Directory.AzureDirectory package?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 11, 2017 @ 00:54
    Shannon Deminick
    0

    Great news :)

    The error you are referring to Cannot delete ,is this happening while you have the debugger attached? If so ,this is normal. Lucene is strange is expects certain exceptions to be thrown, especially when trying to remove/delete files, when these are locked an exception is thrown and Lucene will gracefully retry. If you are getting this error in another way please let me know.

    Oops! I forgot that the AzureDirectoryFactory wasn't part of the code base, i will definitely add that!

    Also note I may merge the Examine.Directory.Sync into the Examine core since there's no additional dependencies.

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 11, 2017 @ 03:26
    Shannon Deminick
    0

    I've done a little testing today and there's definitely still some issues with this. I'll need to get the azure storage emulator going on my local machine to figure out the issues. It's not working as expected with regards to index rebuilding, etc... But at least we're closer.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 11, 2017 @ 07:54
    Jeroen Breuer
    0

    Hi Shannon,

    Yes the 'cannot delete' error only happens with the debugger attached. Good to know this is normal.

    Great that the AzureDirectoryFactory will be part of the code base. That means we probably don't need to code anything to use AzureDirectory. Only some config changes.

    I think merging Examine.Directory.Sync into the Examine core is a good idea if there are no additional dependencies.

    Let me know if I can help testing with the issues that are left.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 12, 2017 @ 00:23
    Shannon Deminick
    0

    Last issues is that I need to more or less re-write the AzureLock, it's not difficult but it has issues. I'll make a PR to the original repo for that when I'm done. In the meantime, Examine.Directory.Sync can be merged into Examine core but that means Examine core will need to be updated to .Net 4.5 from .Net 4.0, this is sort of a breaking change but considering Umbraco already requires that I won't consider it breaking but i'll document it.

    Stay tuned.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 13, 2017 @ 08:55
    Jeroen Breuer
    0

    I see Examine 0.1.80-beta03 and Examine.Directory.AzureDirectory 1.0.0-beta03 are out on NuGet. Will do some tests this weekend :-).

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 16, 2017 @ 20:58
    Jeroen Breuer
    0

    I've upgraded the packages to beta3. I tried to uninstall Examine.Directory.Sync because it's in the core, but I couldn't because Examine.Directory.AzureDirectory 1.0.0-beta03 still depends on it.

    I changed the directory factory to the one shipped with Examine.Directory.AzureDirectory: https://github.com/jbreuer/1-1-multilingual-example/commit/5a15cc9c2ef035e64763a854277aeb511f897346#diff-2047c3df736265061843bec54bd24783

    But that currently throws the following exception:

    Could not load type 'Examine.LuceneEngine.Providers.IDirectoryFactory' from assembly 'Examine, Version=0.1.80.0, Culture=neutral, PublicKeyToken=null'.
    

    So I'm not sure if there is something wrong with my configuration or that some of the packages still need to be update.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 16, 2017 @ 22:40
    Shannon Deminick
    2

    I haven't pushed the new azure dir package which will actually have a new simplified name... Similarly the sync dir has a new name space.

    I have found issues with the azure lock class which I'm in the middle of fixing but not sure when I'll get it completed. Until then azure dir won't work properly, I'll try to get you an update within a week

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 17, 2017 @ 07:50
    Jeroen Breuer
    0

    Guess I was a bit too eager to try it out. Sorry for that :-). Thanks for the update!

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jan 25, 2017 @ 08:44
    Jeroen Breuer
    0

    @Shannon Do you have an estimation when the new AzureDirectory package will be out? I would like to update some websites next week to use AzureDirectory.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Jan 25, 2017 @ 09:28
    Shannon Deminick
    1

    "out" , haha, no i can't even estimate when i'll have a chance to revisit it at the moment - AzureDirectory locks need to be fixed and I'm only half way through that.

  • Umair 13 posts 75 karma points
    Feb 02, 2017 @ 04:27
    Umair
    0

    @Shannon we would also like to use azure for Examine to improve the performance and cpu usage when we swap the slots after deploying to web apps.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Feb 09, 2017 @ 12:25
    Jeroen Breuer
    0

    Hi Shannon do you have a status update on the new AzureDirectory package? I've got the feeling that it's really close to being finished. Would really like to start using it.

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Feb 09, 2017 @ 22:54
    Shannon Deminick
    1

    I will let you know when I've made any progress, i literally have made zero progress since you last reminded me. Rest assured, you'll be the first to know.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 16, 2017 @ 11:56
    Jeroen Breuer
    0

    Hello,

    I don't think a version will be released where we can store Examine indexes in blob storage using AzureDirectory.

    I think at CodeGarden 17 a nice alternative was shown for Examine on web apps in Azure. It needed some changes to the Examine config files. Anyone got an example?

    Jeroen

  • Shannon Deminick 1524 posts 5270 karma points MVP 2x
    Aug 16, 2017 @ 23:07
    Shannon Deminick
    0

    I have a giant backlog of my own personal project work to do including this one. These aren't forgotten about but I haven't been able to find any time to work on these outside of work hours.

    The code is there, there's some issues to fix up, i'd be more than willing to help you test and update the code. OSS is all about working together on things so if you think this would be very valuable then please help out.

    The docs you are looking for are in the docs here:

    https://our.umbraco.org/Documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#examine-v0-1-80

    https://our.umbraco.org/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/flexible#examine-v0-1-80

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 18, 2017 @ 07:14
    Jeroen Breuer
    0

    Thanks for the docs.

    I've had a look at the code a while ago, but I couldn't get it fixed. I can help out with testing.

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Aug 16, 2018 @ 08:18
    Jeroen Breuer
    0

    An update about Examine with Azure Directory (Blob Storage) has been posted here: https://our.umbraco.com/forum/developers/api-questions/56303-Status-of-Examine-and-Azure-blob-storage-providers#comment-294969

    Jeroen

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Feb 11, 2020 @ 08:09
    Jeroen Breuer
    0

    Finally a non-alpha version of Blob Storage based Lucene indexes for Examine has been released. More info: https://shazwazza.com/post/examine-and-azure-blob-storage/

    Jeroen

Please Sign in or register to post replies

Write your reply to:

Draft