After evaluating a few e-commerce solutions out there (both Umbraco and non), we decided that we are going to move forward with Merchello as our solution of choice. I will spare you the reasons, only to say that we are impressed with the direction of the project as a whole.
Our first implementation of Merchello will have a product sync feature, and we are estimating approximately 24,000 products and 400,000 variants that will come over initially. In an attempt to gauge the feasibility of implementing this project in Merchello, I wrote some quick code to generate 1000 products (below), and it took around 10 minutes to complete, and that's just a simple product record with no variant data. At best case, we are looking at 240 minutes (or 4 hours) to insert 24,000 product records.
var svc = Merchello.Core.MerchelloContext.Current.Services.ProductService; // Load Merchello with 1000 products for (int i = 1; i <= 1000; i++ ) { var prod = svc.CreateProduct("Product " + i.ToString(), "P" + i.ToString(), 0.01m); svc.Save(prod); }
Normally, it is inefficient to call Save() for each insert when performing a bulk update, but I could not find another method. There is an override of Save() that accepts an IEnumerable, however that still takes 10 minutes to run (I checked the source, and it is simply looping through the collection and calling the repository's AddOrUpdate() method for each instance - see https://github.com/Merchello/Merchello/blob/1.6.1/src/Merchello.Core/Services/ProductService.cs#L248 ). Is there a more performant solution than what I have above?
At the moment, we do not have a bulk save/import but as you can found you can save collections. In order to do it, you will also probably want to use shared options (which is possible via the database) but not currently exposed. This could considerable reduce the number of inserts required. However, this would be considerably brittle from a back office perspective if one were to edit options via the back office as we have not implemented the constraint checks to handle these as of yet. This is planned for sometime around April.
That being said, bulk imports is something I am very interested in getting implemented - it just needs to be planned so that it is done correctly. Any feedback you have would be really appreciated.
@Rusty Now that we are in the thick of things, the performance issues are becoming more and more noticeable. While adding a single product with 70 variants (14 colors and 5 materials - seems like a fairly common scenario when dealing with variants), it takes 2-3 minutes on my dev machine (which is a high end machine, but is connecting to a remote database). Pushing to our staging servers only cuts the time in half, so it is still taking 1 minute, best-case.
In an attempt to diagnose, I noticed that the process of creating a single product with 70 variants makes approximately 700 database round-trips for a single run through the loop (i.e. a single product). Obviously this can be improved greatly, and I would love to throw my hat in to get this improved sooner than later. Note that it doesn't seem to matter whether I use the service methods, or if I manually create the product in the dashboard - both take the same amount of time and round-trips.
For 10,000+ products at a minute apiece, we're looking at almost 7 days of uninterrupted run time to get all products in place. We can't have this, especially since it seems like the performance doesn't improve much (if at all) when simply updating pricing on existing variants.
@Keith - It's the object graph that is killing it here so I'm going to need to put some thought into it. What is happening is if a product is not found in the examine index it false back does queries to build out every variant which are then all serialized and stored in the lucene document. I would expect that once the index has been updated, the latency goes away - however, when generating a list of say 10 products with 70 variants you will wind up with a ton of db queries (if none of them are found in the index).
Importing 10,000 products with variants will have to be a db direct operation at this point (meaning do not use the Merchello Services). I've started looking at bulk imports using your quantities as a guideline (but did not consider one with so many options / choices) and think the best approach is going to be to write something to do sql direct inserts and then have a routine that corrects for any events that should have fired to update caches and what not. The biggest thing is going to be the Product Index. Even if you use the examine management studio to rebuild the index it'll only grab the first 100 products. That being said that is still a ton of queries for 70 variants - so definitely something to do some head scratching on.
@Rusty I have pulled down the source, and I am going to make some adjustments so that it does a little more eager loading of the child tables, and see if that improves performance, so I can at least speed it up for my current project. If I get any quality results, I will share my notes.
Created a CommitBulk() method in the UnitOfWork classes. This currently is a hack, since I don't have time to implement the Persist(IEnumerable<dto>) methods in all of the individual repositories.
Updated ProductService to call the Save(Ienumerable) when creating missing variants.
Updated ProductVariantService to no longer commit variant updates on calls to CreateProductVariantWithKey() until all variant DTOs are generated, and call the new CommitBulk() method.
Updated ProductVariantRepository with new Persist methods that accept IEnumerable<IProductVariant>(), which are optimized to the best of my ability to perform the fewest amount of round trips.
Updated ProductVariantRepository PerformGetAll to perform more eager loading of product variants, as well as their attributes and warehouse info.
Updated ProductVariantFactory to no longer require a ProductAttributeCollection and CatalogInventoryCollect() on instantiation.
On my local/dev machine, it takes ~10-15 seconds to create a product and update its variants, as opposed to the 1-2 minutes it was taking prior to these updates, and I expect performance to improve even more on the staging and production environments.
I didn't create a pull request for this commit because of the slight hackiness to my approach, however I'm hoping you can use some of this in your own performance updates. Thank you again for your continued work on this product!
@Keith this rocks! I'll pull your code into a local feature branch and take a look. One thing I've been thinking may improve performance is to expose the shared options. This will require a UI change and will take some time to get thought through - but we've slated this to be part of what we're calling the "Product to Content" refactoring which is the next big effort after coupons and discounts.
I currently have a prospective client who keeps all their products in a shop EPOS system.
The EPOS system allows you to produce an export of the entire product catalogue which can either be exported in their proprietary format or in a Magento API format, I think it can also export just the new and changed products.
So I was interested to see your conversation thread about bulk importing products, my client does not want to do any of the product management within the Umbraco Back Office as they already import all the data into their EPOS system, hence my need to know if we can easily bulk import / sync product data?
@Rusty - Have things changed in the core product since Feb 2015?
@Keith - How are you getting on with your 24k products and what do you do when the product catalogue changes, is it a complete rebuild each time?
We do not rebuild the catalog (I assume you mean a complete kill and fill). Fortunately in our situation, the catalog changes only once a year, and things are only added to the catalog (not removed), so the sync process only needs to be run once a year, manually triggered.
Thanks for the quick response, I was really meaning in regard to the performance of importing large product data sets.
And I'm also interested to know how you envisage keeping data sets in sync with a third party source, like in my example a client's EPOS system which they use in their high street shop to manage all of their inventory, so it changes quite frequently and hence they would be looking to keep this in sync with the website, ideally without needing to log into Umbraco at all.
We wanted to get the product content stuff implemented before we changed the repositories and unit of work, which was done in 1.12.x. TBH, I had forgotten about this thread but will grab the code can play with it for 1.14.0
Product Service / API Performance
Hi there,
After evaluating a few e-commerce solutions out there (both Umbraco and non), we decided that we are going to move forward with Merchello as our solution of choice. I will spare you the reasons, only to say that we are impressed with the direction of the project as a whole.
Our first implementation of Merchello will have a product sync feature, and we are estimating approximately 24,000 products and 400,000 variants that will come over initially. In an attempt to gauge the feasibility of implementing this project in Merchello, I wrote some quick code to generate 1000 products (below), and it took around 10 minutes to complete, and that's just a simple product record with no variant data. At best case, we are looking at 240 minutes (or 4 hours) to insert 24,000 product records.
var svc = Merchello.Core.MerchelloContext.Current.Services.ProductService;
// Load Merchello with 1000 products
for (int i = 1; i <= 1000; i++ )
{
var prod = svc.CreateProduct("Product " + i.ToString(), "P" + i.ToString(), 0.01m);
svc.Save(prod);
}
Normally, it is inefficient to call Save() for each insert when performing a bulk update, but I could not find another method. There is an override of Save() that accepts an IEnumerable, however that still takes 10 minutes to run (I checked the source, and it is simply looping through the collection and calling the repository's AddOrUpdate() method for each instance - see https://github.com/Merchello/Merchello/blob/1.6.1/src/Merchello.Core/Services/ProductService.cs#L248 ). Is there a more performant solution than what I have above?
Thanks in advance!
Hi Keith,
At the moment, we do not have a bulk save/import but as you can found you can save collections. In order to do it, you will also probably want to use shared options (which is possible via the database) but not currently exposed. This could considerable reduce the number of inserts required. However, this would be considerably brittle from a back office perspective if one were to edit options via the back office as we have not implemented the constraint checks to handle these as of yet. This is planned for sometime around April.
That being said, bulk imports is something I am very interested in getting implemented - it just needs to be planned so that it is done correctly. Any feedback you have would be really appreciated.
Hello Rusty,
Thank you for your response. We look forward to seeing this product evolve!
@Rusty Now that we are in the thick of things, the performance issues are becoming more and more noticeable. While adding a single product with 70 variants (14 colors and 5 materials - seems like a fairly common scenario when dealing with variants), it takes 2-3 minutes on my dev machine (which is a high end machine, but is connecting to a remote database). Pushing to our staging servers only cuts the time in half, so it is still taking 1 minute, best-case.
In an attempt to diagnose, I noticed that the process of creating a single product with 70 variants makes approximately 700 database round-trips for a single run through the loop (i.e. a single product). Obviously this can be improved greatly, and I would love to throw my hat in to get this improved sooner than later. Note that it doesn't seem to matter whether I use the service methods, or if I manually create the product in the dashboard - both take the same amount of time and round-trips.
For 10,000+ products at a minute apiece, we're looking at almost 7 days of uninterrupted run time to get all products in place. We can't have this, especially since it seems like the performance doesn't improve much (if at all) when simply updating pricing on existing variants.
@Keith - It's the object graph that is killing it here so I'm going to need to put some thought into it. What is happening is if a product is not found in the examine index it false back does queries to build out every variant which are then all serialized and stored in the lucene document. I would expect that once the index has been updated, the latency goes away - however, when generating a list of say 10 products with 70 variants you will wind up with a ton of db queries (if none of them are found in the index).
Importing 10,000 products with variants will have to be a db direct operation at this point (meaning do not use the Merchello Services). I've started looking at bulk imports using your quantities as a guideline (but did not consider one with so many options / choices) and think the best approach is going to be to write something to do sql direct inserts and then have a routine that corrects for any events that should have fired to update caches and what not. The biggest thing is going to be the Product Index. Even if you use the examine management studio to rebuild the index it'll only grab the first 100 products. That being said that is still a ton of queries for 70 variants - so definitely something to do some head scratching on.
@Rusty I have pulled down the source, and I am going to make some adjustments so that it does a little more eager loading of the child tables, and see if that improves performance, so I can at least speed it up for my current project. If I get any quality results, I will share my notes.
@Keith - that would be great. Look forward to hearing what you come up with.
@Rusty - I have completed what I can with regard to performance, specifically when it comes to dealing with products with more than a few variants. Feel free to review this commit within my fork of the repo here: https://github.com/tigreye007/Merchello/commit/9e1fcaa590873c74d6fbbf91b13fe97d031b8bb6
Notes:
On my local/dev machine, it takes ~10-15 seconds to create a product and update its variants, as opposed to the 1-2 minutes it was taking prior to these updates, and I expect performance to improve even more on the staging and production environments.
I didn't create a pull request for this commit because of the slight hackiness to my approach, however I'm hoping you can use some of this in your own performance updates. Thank you again for your continued work on this product!
@Keith this rocks! I'll pull your code into a local feature branch and take a look. One thing I've been thinking may improve performance is to expose the shared options. This will require a UI change and will take some time to get thought through - but we've slated this to be part of what we're calling the "Product to Content" refactoring which is the next big effort after coupons and discounts.
H5YR
Hi Rusty & Keith,
I currently have a prospective client who keeps all their products in a shop EPOS system.
The EPOS system allows you to produce an export of the entire product catalogue which can either be exported in their proprietary format or in a Magento API format, I think it can also export just the new and changed products.
So I was interested to see your conversation thread about bulk importing products, my client does not want to do any of the product management within the Umbraco Back Office as they already import all the data into their EPOS system, hence my need to know if we can easily bulk import / sync product data?
@Rusty - Have things changed in the core product since Feb 2015? @Keith - How are you getting on with your 24k products and what do you do when the product catalogue changes, is it a complete rebuild each time?
Cheers, Chris
@Chris - We've added the ability do extend products via an Umbraco content type similar to Nested Content (but not nested).
Hi Chris,
We do not rebuild the catalog (I assume you mean a complete kill and fill). Fortunately in our situation, the catalog changes only once a year, and things are only added to the catalog (not removed), so the sync process only needs to be run once a year, manually triggered.
Hi Rusty,
Thanks for the quick response, I was really meaning in regard to the performance of importing large product data sets.
And I'm also interested to know how you envisage keeping data sets in sync with a third party source, like in my example a client's EPOS system which they use in their high street shop to manage all of their inventory, so it changes quite frequently and hence they would be looking to keep this in sync with the website, ideally without needing to log into Umbraco at all.
Cheers, Chris
Hi Chris,
We wanted to get the product content stuff implemented before we changed the repositories and unit of work, which was done in 1.12.x. TBH, I had forgotten about this thread but will grab the code can play with it for 1.14.0
is working on a reply...