Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Thomas 160 posts 335 karma points
    Dec 07, 2020 @ 18:53
    Thomas
    0

    Faster way to query large amount of articles for news front page site

    Hi all, In the context of a customer News site redesign I need yours opinion for the fastest way to query articles from different categories for the front page of the site. The current site is extremely slow now and I am looking for a better approach. Consider that it must be paginated, just like all the other news sites or blogs.

    Best regards

    Thomas

  • Amir Khan 1287 posts 2744 karma points
    Dec 07, 2020 @ 19:33
    Amir Khan
    0

    How are your categories set up and what is the current query?

  • Thomas 160 posts 335 karma points
    Dec 07, 2020 @ 19:59
    Thomas
    0

    The categories are a document type that corresponds to the menu items, either visible or not. I have ten categories in level 2 visible and another for example 10 that are not visible in the main menu but a widget exists in the front page that retrieve articles from those categories.

  • Thomas 160 posts 335 karma points
    Dec 07, 2020 @ 20:06
    Thomas
    0

    As for the current query is as follows. I am calling the function GetArticlesFromCategory. Each category has a unique-name in order to avoid the Ids. So firstly i am retreiving the category node and then i am running an iteration for all the children of this node. I know that this was not the optimal solution but it was my second umbraco site 10 years ago. Following is the code i am using. I will appreciate any suggestions for the otimal solution in order to make the site retrieve the articles as fast as possible.

    Best Regards,

    Thomas

    private static List<Node> GetArticlesFromCategory(string categoryUniqueName, out Node parentNode)
        {
            List<Node> foundNodes = new List<Node>();
    
            parentNode = FReportConstants.CurrentHome.GetDescendantNodesByType("Categories").Where(n => n.HasProperty("uniqueName") && n.GetProperty("uniqueName").Value == categoryUniqueName).SingleOrDefault();
    
            if (null != parentNode)
            {
                foundNodes = GetAllNodesByType(parentNode.Id, "Article");
            }
    
            return foundNodes;
        }
    
        private static List<Node> GetAllNodesByType(int NodeId, string typeName)
        {
            List<Node> foundNodes = new List<Node>();
            var node = new Node(NodeId);
    
            foreach (Node childNode in node.Children)
            {
                var child = childNode;
                if (child.NodeTypeAlias == typeName)
                {
                    foundNodes.Add(child);
                }
    
                if (child.Children.Count > 0)
                    GetAllNodesByType(ref foundNodes, child, typeName);
            }
    
            return foundNodes.OrderByArticleUpdateDate().ToList();
        }
    
        private static void GetAllNodesByType(ref List<Node> foundNodes, Node node, string typeName)
        {
            foreach (Node childNode in node.Children)
            {
                var child = childNode;
                if (child.NodeTypeAlias == typeName)
                {
                    foundNodes.Add(child);
                }
    
                if (child.Children.Count > 0)
                    GetAllNodesByType(ref foundNodes, child, typeName);
            }
        }
    
  • David Brendel 792 posts 2970 karma points MVP 3x c-trib
    Dec 09, 2020 @ 08:48
    David Brendel
    100

    Hi Thomas,

    I would definitly leverage Examine for this type of searching, displaying paging. It's way faster then just querying descendants. And then do some grouping. If you tweak the indexing of your articles a bit you could actually index the category on the article and thus do a direct filtering/grouping based on that. Also examine has paging build in which is also quite fast.

    So would recommend having a look at it.

    Regards David

  • Thomas 160 posts 335 karma points
    Dec 10, 2020 @ 09:46
    Thomas
    0

    Following David Brendel suggestion, this is what i did and i would like your opinion if that is correct or requires more optimization.

    First i create a model in order to store the results and return it to the view.

    public class ArticlesSearchModel
    {
        public string NodeAlias { get; set; }
        public int CurrentPage { get; set; }
        public int ParentId { get; set; }
        public int ItemsPerPage { get; set; }
        public int TotalItems { get; set; }
        public int TotalPages { get; set; }
        public List<NewsArticle> Articles { get; set; }
    
        public ArticlesSearchModel()
        {
            CurrentPage = 1;
            ItemsPerPage = 20;
        }
    }
    

    Then i create an Interface, a component and a composer in order to make it available with IoC.

    [RuntimeLevel(MinLevel = RuntimeLevel.Run)]
    public class SubscribeToSearchArticlesComponentComposer : IUserComposer
    {
        public void Compose(Composition composition)
        {
            composition.Register<ISearchArticles, SearchArticlesComponent>();
        }
    }
    
    public interface ISearchArticles
    {
        ArticlesSearchModel GetArticles(ArticlesSearchModel model);
    }
    
    public class SearchArticlesComponent : ISearchArticles
    {
        private readonly IUmbracoContextFactory _umbracoContextFactory;
        private readonly IExamineManager _examineManager;
        private readonly ILogger _logService;
    
        public SearchArticlesComponent(IUmbracoContextFactory umbracoContextFactory, IExamineManager examineManager, ILogger logService)
        {
            _umbracoContextFactory = umbracoContextFactory;
            _examineManager = examineManager;
            _logService = logService;
        }
    
        public ArticlesSearchModel GetArticles(ArticlesSearchModel model)
        {
            if (string.IsNullOrEmpty(model.NodeAlias))
            {
                return null;
            }
    
            // Get the external index with error checking
            if (!_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out IIndex index))
            {
                throw new InvalidOperationException($"No index found by name {Constants.UmbracoIndexes.ExternalIndexName}");
            }
    
            try
            {
                var searcher = index.GetSearcher();
                var criteria = searcher.CreateQuery(IndexTypes.Content);
                var examineQuery = criteria.NodeTypeAlias("newsArticle").And();
                examineQuery.Field("parentID", model.ParentId).And();
                examineQuery.All().OrderByDescending(new SortableField[] { new SortableField("articleUpdateDate") });
    
                int pageSize = model.ItemsPerPage;
                int pageIndex = model.CurrentPage - 1;
    
                ISearchResults searchResult = examineQuery.All().Execute(maxResults: pageSize * (pageIndex + 1));
                IEnumerable<ISearchResult> pagedResults = searchResult.Skip(pageIndex * pageSize);
                int totalResults = Convert.ToInt32(searchResult.TotalItemCount);
    
                model.TotalItems = totalResults;
                model.TotalPages = (totalResults + model.ItemsPerPage - 1) / model.ItemsPerPage;
    
                if (pagedResults != null && pagedResults.Count() > 0)
                {
                    model.Articles = GetArticlesFromSearch(pagedResults);
                }
            }
            catch (Exception e)
            {
                _logService.Error(GetType(), "Search | Exception: {0} | Message: {1}", e.InnerException != null ? e.InnerException.ToString() : "", e.Message != null ? e.Message.ToString() : "");
            }
    
            return model;
        }
    
        private List<NewsArticle> GetArticlesFromSearch(IEnumerable<ISearchResult> pagedResults)
        {
            List<NewsArticle> articles = new List<NewsArticle>();
            using (UmbracoContextReference umbracoContextReference = _umbracoContextFactory.EnsureUmbracoContext())
            {
                foreach (ISearchResult result in pagedResults)
                {
                    if (int.TryParse(result.Id, out int nodeId))
                    {
                        IPublishedContentCache contentHelper = umbracoContextReference.UmbracoContext.Content;
                        if (contentHelper.GetById(nodeId) is NewsArticle article)
                        {
                            articles.Add(article);
                        }
                    }
                }
            }
    
            return articles;
        }
    }
    

    And finally in order to get the articles from the view i am using dependency injection in the view page as follows

    @using Current = Umbraco.Web.Composing.Current;
    @{
    Layout = "InnerMasterLayout.cshtml";
    int currentPage = string.IsNullOrEmpty(Request["p"]) ? 1 : Convert.ToInt32(Request["p"]);
    var results = Current.Factory.GetInstance<ISearchArticles>().GetArticles(new FinancialReport.UmbLibrary.Models.ArticlesSearchModel() { ParentId = Model.Id, NodeAlias = "newsArticle", CurrentPage = currentPage });}
    

    I tested and it looks like retrieving the articles faster than my previous method using in the old umbraco site. And the paging is working perfectly also. But i need to test it on a category with more than 10.000 articles, because the categories i tested are almost 1000.

    I will appreciate any suggestions for even better performance.

    Thanks

    Thomas

  • Thomas 160 posts 335 karma points
    Dec 10, 2020 @ 10:31
    Thomas
    0

    Now i have another problem, to be more specific it is not a problem, i just don't know how to do it and i need help. The sorting is not working because as i can understand the custom property "newsArticleUpdateDate" i am using does not exists in the Lucene Indexer and is not marked as Sortable.

    Can anyone explain me how to add the property to the indexer? Config file does not exists in Umbraco 8 as i read in several articles, probably that is for Umbraco 7.

    Thanks in advance

    Thomas

  • David Brendel 792 posts 2970 karma points MVP 3x c-trib
    Dec 10, 2020 @ 10:49
    David Brendel
    0

    Hi Thomas,

    in general I would suggest to use a custom controller for your page and do the searching there and not in the view. But I think this will at least get you up and running more quickly.

    For adding custom values to the index there are events on the ExamineManager that you can use. Have a look at the documentation here.

    Basically what you need to do is this part: e.ValueSet.TryAdd("combinedField", combinedFields.ToString());

    With "combinedField" beeing your custom date field which you want to sort on. So would probably look something like: e.ValueSet.TryAdd("newsArticleUpdateDate", "the date");.

    You can have a look at the fields indexed as I would think that all properties are actually indexed if they are configured on the DocumentType.

    Regards David

  • Jesper Ordrup 1019 posts 1528 karma points MVP
    Dec 10, 2020 @ 11:06
    Jesper Ordrup
    0

    Edit: somehow I didnt see all the other anwers before writing this :-)

    Hi Thomas,

    If its really slow it could be because you are hitting the database. If you're using the backend api. Make sure u use the IPublishedContent collections. "This is the way" and works really well - even for very large sites.

    https://our.umbraco.com/documentation/Reference/Querying/IPublishedContent/Collections/

    If you need really really fast for very large content then I suggest you query the examine index. Here you can also customize how and what you index (onsave) making it even faster to query.

    https://our.umbraco.com/documentation/reference/searching/examine/overview-explanation

    /Jesper

  • Thomas 160 posts 335 karma points
    Dec 11, 2020 @ 12:22
    Thomas
    0

    Well, i created a new index with my custom fields following this example.

    https://gist.github.com/Shazwazza/3d32f4f37d9adadfe56400d0c24db6bd

    By adding the custom field ParentID with FieldDefinitionTypes.Int, as soon as i was quering the examine i was getting an error "Could not perform a range query on the field parentID, it's value type is Examine.LuceneEngine.Indexing.FullTextType". I tried testing the index in Luke and i couldn't retrieve the data. I change the field to FieldDefinitionTypes.FullTextSortable and the code runs, does not through any exceptions, but without results. I got the query from the code and use it in Luke and i am getting results. Any idea?

    The query from the code is the following

    +ParentID:2051 +NodeTypeAlias:newsarticle

    and the code i am using to query examine is the following

    try
            {
                var searcher = index.GetSearcher();
                var criteria = searcher.CreateQuery(IndexTypes.Content);
    
                var examineQuery = criteria.Field("ParentID", model.ParentId.ToString());
                examineQuery.And().Field("NodeTypeAlias", "newsArticle");
                examineQuery.OrderByDescending(new SortableField[] { new SortableField("NewsArticleUpdateDate") });
    
                int pageSize = model.ItemsPerPage;
                int pageIndex = model.CurrentPage - 1;
    
                ISearchResults searchResult = examineQuery.Execute(maxResults: pageSize * (pageIndex + 1));
                IEnumerable<ISearchResult> pagedResults = searchResult.Skip(pageIndex * pageSize);
                int totalResults = Convert.ToInt32(searchResult.TotalItemCount);
    
                model.TotalItems = totalResults;
                model.TotalPages = (totalResults + model.ItemsPerPage - 1) / model.ItemsPerPage;
    
                if (pagedResults != null && pagedResults.Count() > 0)
                {
                    model.Articles = GetArticlesFromSearch(pagedResults);
                }
            }
            catch (Exception e)
            {
                _logService.Error(GetType(), "Search | Exception: {0} | Message: {1}", e.InnerException != null ? e.InnerException.ToString() : "", e.Message != null ? e.Message.ToString() : "");
            }
    
  • Thomas 160 posts 335 karma points
    Dec 11, 2020 @ 13:07
    Thomas
    0

    Finally i found the problem why it does not return records. The problem is with the following line

    var criteria = searcher.CreateQuery(IndexTypes.Content);

    When i am using my custom index it does not require the parameter "IndexTypes.Content". Without the parameter it works .

Please Sign in or register to post replies

Write your reply to:

Draft