Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Mar 11, 2013 @ 17:33
    Tim
    1

    Updating Examine Indexes Programatically

    Hiya,

    I have an Umbraco site that stores data from a 3rd party service in an Examine index. I want the site to update the indexes every hour or so, however, I'm running into the issue that every time we update the indexes, there's a period where the indexes are either empty, or only partially populated, causing incomplete data.

    Does anyone know of a way where we can update the indexes, without affecting the site? For example, updating the indexes to a duplicate set of indexes every hour and then overwriting the indexes that the searches use once that's done? I'm trying to find the best way to approach this and was wondering if anyone has run into anything similar?

  • Anthony Dang 1404 posts 2558 karma points MVP 3x c-trib
    Mar 11, 2013 @ 17:39
    Anthony Dang
    0

    I'm interested to know the answer to this too.

    Is it the entire index that needs to be recreated or do you know which items should be updated?

     

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Mar 11, 2013 @ 17:51
    Shannon Deminick
    0

    First need to know what umbraco + examine version you are using?

    So to verify, what you are trying to acheive is that you want to update some of the index with new data from this 3rd party service on an interval?

    Need to know how this is being caused "every time we update the indexes, there's a period where the indexes are either empty, or only partially populated, causing incomplete data." ? How are you updating the indexes currently? what is the 'period of time' ? Does this index contain a ton of data ?

  • Botond Levai 5 posts 73 karma points
    Jan 14, 2019 @ 14:17
    Botond Levai
    0

    HI @Shannon Deminick,

    I've just searched on a similar subject and found your post here, I would be interested if it is doable whether or not to update some indexes based on a 3rd party API calls, and don't know yet if could update just the affected items if not we would need to update about 20K items(basically we would need to rebuild the whole index collection on this type of items), once per day, I would like also to mention that I'm using the latest version of 0.1.89.

    Can you please suggest me a solution for this case?

    Thanks in advance,

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Mar 11, 2013 @ 17:57
    Ismail Mayat
    0

    Tim,

    I guess you have built the custom indexer using simpledata indexer? If so then that means you have to do periodic rebuilds you got 2 options:

    1. Write a new indexer inherit from luceneindexer this will give you update / delete etc methods so you dont have to do periodic rebuilds. Issue is it's not easy i tried it but could'nt get it to work.
    2. The easier option is if you can setup some events on your third party source that you can tap into then you can add / delete / update the index then you dont need hourly rebuild. For example if its a sql server db your indexing then stick a trigger on the table your indexing that trigger needs to call .net assembly that has your your re index node code.  Code like  
            /// <summary>
            /// Updates an entry in the search index that is related to the post provided by the parameter.
            /// </summary>
            /// <param name="post"></param>
            private void UpdateIndex(Post post)
            {
                var examineNode = post.ToSimpleDataSet().RowData.ToExamineXml(post.Id, "CustomData");
                ExamineManager.Instance.IndexProviderCollection["CustomIndexer"].ReIndexNode(examineNode, "CustomData");
            }
     
     
    public SimpleDataSet ToSimpleDataSet()
            {
                var provocationId = String.Empty;
                var jamId = String.Empty;
                var customField = String.Empty;
                if (this.ProvocationId.HasValue)
                {
                    provocationId = this.ProvocationId.Value.ToString(CultureInfo.InvariantCulture);
                }
                jamId = this.JamId.ToString(CultureInfo.InvariantCulture);
                if (this.Author.CustomField != null)
                {
                    customField = this.Author.CustomField.Id.ToString(CultureInfo.InvariantCulture);
                }
                var data = new SimpleDataSet()
                {
                    //create the node definition, ensure that it is the same type as referenced in the config
                    NodeDefinition = new IndexedNode()
                    {
                        NodeId = this.Id,
                        Type = "CustomData"
                    },
                    //add the data to the row
                    RowData = new Dictionary<string, string>() 
                        {
                             {"id", this.Id.ToString(CultureInfo.InvariantCulture)},
                             {"title", this.Title},
                             {"body", StringUtil.RemoveHtmlTags(HttpUtility.HtmlDecode(this.Body))},
                             {"jamId", jamId},
                             {"provocationId", provocationId},
                             {"dateUpdated", this.DateCreated.ToString(CultureInfo.InvariantCulture)},
                             {"customField", customField},
                             {"tags", String.Join(",", this.PostTags.Select(x => x.TagId.ToString(CultureInfo.InvariantCulture)).ToArray())},
                             {"topicId", this.Topic.Id.ToString(CultureInfo.InvariantCulture)},
                             {"topicViews", this.Topic.Views.ToString(CultureInfo.InvariantCulture)},
                             {"topicTotalLikes", this.Topic.DeepLikeCount.ToString(CultureInfo.InvariantCulture)},
                             {"topicTotalReplies", this.Topic.DeepReplyCount .ToString(CultureInfo.InvariantCulture)},
                             {"topicDateUpdated", this.Topic.DateCreated.ToString(CultureInfo.InvariantCulture)},
                             {"isActive", this.IsActive.ToString(CultureInfo.InvariantCulture)}
                         }
                };
                return data;
            }
    obviously in your case the data would come from the db record also in order to update you will need to store the original db unique key and they in the case of updates you can get that record back and delete it then re insert it.
    Regards
    Ismail
  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Mar 11, 2013 @ 21:34
    Tim
    0

    Thanks for the replies guys!

    @Shannon, the site is currently running 4.11.4, but we can upgrade if need be, as the site isn't live yet. We're basically indexing data from a completely separate 3rd party web service that spits out XML. We're using a SimpleDataIndexer, as per the example in the Examine source, except, we're connecting to the service, getting the XML, and converting it to a simple data set before running a full index rebuild. Currently the process runs hourly, it's a windows scheduled task that calls a /base URL that calls the RebuildIndex method on the whole index. If you search when the reindexing is running, you get either no data, or some data, as the indexing process is taking place. Currently there are around a 1000 items in the index, but there will likely be a lot more over time.

    @Ismail, sadly the service is a separate web service with no events we can plug into to update each item individually. I guess I could alter the re-indexing code to compare the edit dates on the web service data, and update only if there's been a change using the ReIndexNode method? I would also need to add the items if there's an addition, and remove them if there's items in the index that is no longer returned by the web service. That would slow down the indexing I guess, but if it solves the issue of all the data disappearing during the update, then I can live with the slower indexing!

    @Anthony, at the moment we're just bulk updating everything, but as I mentioned earlier, I could recode the logic to update records individually if need be, it'll just mean changing the code slightly.......

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Mar 11, 2013 @ 21:38
    Shannon Deminick
    103

    @Tim... you should avoid doing an index rebuild at all costs... this should be a one time only thing and then you should just keep your index updated by updating the data that requires updating. Otherwise rebuilding your index will delete the index and then start from scratch based on your data source. If you update each item based on it's ID, then the index will always be there and Examine will just ensure that it is updated and this will not affect your live site (apart from additional CPU usage required, but it should be similar to rebuilding the index, if not faster.. )  What I should do in the core of Examine though is if a Rebuild index action is executed and the index exists, then I should rebuild an offline index and then overwrite the live one... though that isn't created right now and it would still be better to just keep your index in sync rather than rebuild.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Mar 11, 2013 @ 22:04
    Ismail Mayat
    1

    Tim,

    The records from the web service do they have unique key and last update date? If so then shove those in the index. Next your polling app get it to get last record id then check if you have from web service ids greater than that those are your new records to insert. For others you could use update date to figure out updated stuff and then delete and re insert those.

    Regards

     

    Ismail

     

     

  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Mar 12, 2013 @ 10:10
    Tim
    0

    Thanks guys, the records do indeed have unique ids, and we've got them to add updated date, so I'm going to add a new method to do an update just to things that have changed that'll run hourly, and we'll do a full rebuild maybe once every two weeks. Thanks for all the advice, it's greatly apreciated :)

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Mar 12, 2013 @ 13:51
    Shannon Deminick
    0

    I've added a task for this too:

    http://issues.umbraco.org/issue/U4-1898

  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Mar 12, 2013 @ 17:36
    Tim
    0

    Have this working now, and it works much better! Thanks again guys, tap me up for a beer at Codegarden :)

  • Rasmus Fjord 675 posts 1566 karma points c-trib
    Feb 23, 2017 @ 07:59
    Rasmus Fjord
    0

    Just wanna high five you guys, this post saved my custom index :)

  • Comment author was deleted

    Dec 17, 2020 @ 09:17

    still relevant :) thanks, will try to wrap this into a blog post on how to index custom data and keep it up to date!

Please Sign in or register to post replies

Write your reply to:

Draft