Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Sam Moore 27 posts 112 karma points
    Jun 20, 2011 @ 21:51
    Sam Moore
    0

    Examine & 20K+ nodes - indexing issues

    Hi there everyone!

    I've been using umbraco for about a year now, but I've only recently started using Examine for searching. I'm having an issue that I can't seem to find any info on from searching both this site and Google. The gist of the issue is that I have A LOT of nodes, and it appears that the Examine Index Queue is not being fully processed. I've tried the usual solutions (deleting the ExamineIndexes folder and restarting the application pool; saving a publishing a node to trigger an optimization; etc), but nothing seems to work. I'm looking at my `ExamineIndexes\LAZ\Queue` folder right now and I have 7 Folders (1-5, 9, 10). They all have about 500 items in them with the exception of 10 which has about 4800 items). It seems that if I publish a node and then recheck the folders that one of them will drop down by a few hundred, but then the progress stops.

    The only error I'm getting in error logs is when I publish certain articles that exist in different areas of the tree with the same name:

    Error adding to SiteMapProvider: System.InvalidOperationException: Multiple nodes with the same URL '/the-facts-on-file-companion-to-british-poetry-before-1600/sonnet-31.aspx' were found. XmlSiteMapProvider requires that sitemap nodes have unique URLs.     at System.Web.StaticSiteMapProvider.AddNode(SiteMapNode node, SiteMapNode parentNode)     at umbraco.presentation.nodeFactory.UmbracoSiteMapProvider.loadNodes(String parentId, SiteMapNode parentNode)

    I'm using umbraco 4.5.2, Lucene.Net Version 2.9.2.2, UmbracoExamine Version 0.10.0.292, and Examine Version 0.10.0.292.

     

    Thanks,

    Any help would be greatly appreciated. I can also post up any other details you need.

  • Lee Kelleher 4020 posts 15802 karma points MVP 13x admin c-trib
    Jun 21, 2011 @ 11:28
    Lee Kelleher
    0

    Hi Sam,

    Are you using the "UmbracoSiteMapProvider" on your site? If not, then try removing it from your Web.config (e.g. comment it out).

    Then do what you tried again (clear index queue, restart app-pool, republish a page to trigger indexing).

    Let us know if that works?

    Cheers, Lee.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jun 21, 2011 @ 12:03
    Ismail Mayat
    0

    Sam,

    Does it seem to stop processing at the same item?  I had issue before where it had problem with one of the nodes and that would choke the whole thing cant remember what the problem was with that node but once i fixed it then it all worked. 

    Also do you have latest examine version of codeplex? If not i would try that as well not sure if the version you are using is the standard one you get when you do umbraco 452 install i suspect it is so you may want to get latest. Please note i think there are a couple of config changes with latest version but take a look at the readme file that should point out changes.

    Regards

    Ismail

  • Sam Moore 27 posts 112 karma points
    Jun 21, 2011 @ 14:11
    Sam Moore
    0

    @Lee - I'm not using SiteMapProvider at all. I will disable it and see if that fixes the issue.

    @Ismail - There are way too many items to be sure that it's stopped at the same one, but I don't really think it is because it will jump around and remove entire folders from the Queue before it stops. Also, I'm using the 1.1 RTM stable release from codeplex, not the default version included in 4.5.2.

  • Sam Moore 27 posts 112 karma points
    Jun 22, 2011 @ 22:43
    Sam Moore
    0

    @Lee & @Ismail

    I've tried all of your suggestions and I still am finding that it only gets through about half of the items (if I'm lucky). I also tried using the admin tools from the package repository, but I was unable to get the indexing to complete. It will go through on application start and create 10 folders, each with approximately 500 items except for the 10th folder which will have ~15000 items. The 10th folder is usually the one that never ends up getting processed, so all of those items are missing from the index.

    Any ideas?

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jun 23, 2011 @ 10:37
    Ismail Mayat
    0

    Sam,

    What happens when you rebuild the internal index can you try rebuild on that to see if that works.

    Regards

    Ismail

  • Sam Moore 27 posts 112 karma points
    Jun 23, 2011 @ 14:42
    Sam Moore
    0

    @Ismail - I tried to rebuild the InternalIndexer but nothing shows up in my internal search anyway.. If I look in the Queue folder for the InternalIndex I see the same 10 folders that showed up in my CustomIndex Queue folder with 500 items each and then 20000 items in folder number 10. My search works fine for the custom IndexSet that I created, it just only returns results for the things that it indexed (which is only about half of the content). Here is my ExamineSettings.config:

     <?xml version="1.0"?>

    <Examine>

      <ExamineIndexProviders>

        <providers>

          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"

               runAsync="true"

               supportUnpublished="true"

               supportProtected="true"

               interval="10"

               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

          

     <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"

               runAsync="true"

               supportUnpublished="true"

               supportProtected="true"

               interval="10"

               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

      

     <add name="LAZIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"

               runAsync="true"

               supportUnpublished="false"

               supportProtected="true"

               interval="10"

               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"

               enableDefaultEventHandler="true"/>

        </providers>

      </ExamineIndexProviders>

      

      <ExamineSearchProviders defaultProvider="InternalSearcher">

        <providers>

          <add name="InternalSearcher"

      type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"

               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

          

     <add name="InternalMemberSearcher"

          type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"

               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>           

     

     <add name="LAZSearcher"

          type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />           

        </providers>

      </ExamineSearchProviders>

    </Examine>

    I'm not sure if I should be using `Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net` or `UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine` but I'm tending to think that I should use the latter as it's the one used in the latest examples on codeplex.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jun 28, 2011 @ 10:29
    Ismail Mayat
    0

    there is new version of examine out have you tried that see if that fixes your problem?

    Regards

     

    Ismail

  • Kostas Athanasopoulos 4 posts 25 karma points
    Jul 20, 2011 @ 00:02
    Kostas Athanasopoulos
    1

    Hello Sam.

    I was wondering if you managed to create your full index.

    We migrated our .NET sites to umbraco 4.5.2 and have also added search capabilities. We also use 1.1 version of Examine but indexing never seems to make it till the end. We always have leftovers in the queue folder and we are trying to find a way to make the indexing engine to continue processing items in the queue (which is rather large - around 300 mb). Note that with the assemblies shipped with Version 4.5.2 of umbraco, indexing was triggered and complete by its own after 3 hours (we soon had to change Examine version though because everything was indexed in Lower case!).

     

    Dear Ismail
    We have also tried to use the Latest version of Examine (1.2) with disastrous results (I can post a list of logged errors - not stable/scalable at all).

    Do you have any suggestions on how to index large sites?  We currently publish folder by folder containing less than 1000 nodes.

     

     

     

Please Sign in or register to post replies

Write your reply to:

Draft