I finally got down to researching why relations perform so bad in Umbraco. The RelationService is at least unofficially known as the only service you're allowed to use in the front-end, so it should perform at least as good as everything else. Turns out neither relations or relation types were cached.
Fetching relations for any entity creates n*2+1 queries to the DB every time you hit a page loading them. That's for one call. The solution was simply to swap _noCache for _cacheHelper in the default repository factory.
I dunno if the PR is going to be pulled, but if it isn't, only a select few will know that to be able to use relations in a performant way, you'll either have to swap out the default repo factory, or you'll have to create some outer caching strategy per use-case.
While going through this code, I realize that I really have a big issue with the whole repository / cache mechanism.
If I were creating some other solution where my domain entities are persisted using a database, I'd probably use some kind of "enterprise level ORM". I'd map my entities to the DB structure in some way, making sure the models are first-class citizens, and the DB is just support. (You know what I mean)
When "hydrating" the entities, I'd either use a default eager loading strategy per entity, or I'd select an eager loading strategy per use-case. That would mean that the ORM would create a SQL query that left- or inner joined associations as needed, unless it's a really complex graph. In case of the latter, I'd do a couple of queries and merge the results. In most cases though, I'd end up with one query to the DB for each set of entities. (Including lists of-)
In Umbraco, we fetch a list of entities. They're hydrated as DTOs. This list of DTOs are enumerated using the yield keyword so we can do paging. However, for each item in the list of DTOs, a new query is made for the same fields, limited to that entity. It is then (in most cases) mapped 1:1 to a "domain model"/entity. Further, if it's got associations, each association is fetched as DTO, converted to an entity and then added to the outer entity.
Granted, there's caching here, so it only happens once per entity. But still - in a solution with tens or hundreds of thousands entities which are being viewed all the time, we're going to the DB for each single entity the first time it's used. (Unless NullCache is used)
I honestly can't help but think that's a really poor performing solution, even with the caching strategies employed. As far as I know, the cache is also being choked at some point to preserve memory, and we're back to the nasty n*x+1 queries.
Are there any plans to improve this at all, or is this viewed as a really performant and well architected solution?
Interesting. I do understand why Umbraco chose not to use an "Enterprise" ORM (let's be candid - Entity Framework) at the time - it was hard to control, and didn't really target multiple database platforms. PetaPoco seemed like a good bet.
But I think things have changed - EF6 is now extremely good. It can generate extremely efficient queries - often better than any human could. With code first it's very clean - you can use POCOs and there's no horrible XML mapping files to worry about. It has some very clever strategies for dealing with inheritance. It supports async queries. And it has, of course, code migrations too. It's upgradeable via NuGet, and going forward will support .NET Core.
And, yeah, it doesn't support MySQL - but let's be honest, who uses MySQL with .NET? SQL Express fills the free slot and is arguably superior. And you can use LocalDB to, erm, run your DB locally.
Whilst it would be a big change - and something only for v8 - it would be worth considering (with perhaps a simpler DB schema too :p).
Haha, I was deliberately avoiding naming ORMs here. Could've just as well mentioned NHibernate, and possibly others I don't know about. I hope the thread can continue without choosing one. ;)
My hope is that these kinds of queries / eager loading strategies can hydrate the entities without repeating queries for each entity using PetaPoco or NPoco when it takes over. (afaik, that's the plan)
Here's some input on how to do eager loading strategies with the repos. Examples (mostly java/hibernate) can be found by googling eager loading strategy.
I solved it in my project using my own sql query to fetch relations to ensure performance, because I didn't have time to look into yet. And this was a quick performance boost.
I only looked at the relations service but I can imagine similar bottlenecks are present in other services as well.
Yep. I hope this thread can help finding the right direction for repos in v8.
At least something will be done for 7.5.5. ;)
http://issues.umbraco.org/issue/U4-9077
The entire repository / cache mechanism
I finally got down to researching why relations perform so bad in Umbraco. The RelationService is at least unofficially known as the only service you're allowed to use in the front-end, so it should perform at least as good as everything else. Turns out neither relations or relation types were cached.
Fetching relations for any entity creates n*2+1 queries to the DB every time you hit a page loading them. That's for one call. The solution was simply to swap _noCache for _cacheHelper in the default repository factory.
I dunno if the PR is going to be pulled, but if it isn't, only a select few will know that to be able to use relations in a performant way, you'll either have to swap out the default repo factory, or you'll have to create some outer caching strategy per use-case.
While going through this code, I realize that I really have a big issue with the whole repository / cache mechanism.
If I were creating some other solution where my domain entities are persisted using a database, I'd probably use some kind of "enterprise level ORM". I'd map my entities to the DB structure in some way, making sure the models are first-class citizens, and the DB is just support. (You know what I mean)
When "hydrating" the entities, I'd either use a default eager loading strategy per entity, or I'd select an eager loading strategy per use-case. That would mean that the ORM would create a SQL query that left- or inner joined associations as needed, unless it's a really complex graph. In case of the latter, I'd do a couple of queries and merge the results. In most cases though, I'd end up with one query to the DB for each set of entities. (Including lists of-)
In Umbraco, we fetch a list of entities. They're hydrated as DTOs. This list of DTOs are enumerated using the yield keyword so we can do paging. However, for each item in the list of DTOs, a new query is made for the same fields, limited to that entity. It is then (in most cases) mapped 1:1 to a "domain model"/entity. Further, if it's got associations, each association is fetched as DTO, converted to an entity and then added to the outer entity.
Granted, there's caching here, so it only happens once per entity. But still - in a solution with tens or hundreds of thousands entities which are being viewed all the time, we're going to the DB for each single entity the first time it's used. (Unless NullCache is used)
I honestly can't help but think that's a really poor performing solution, even with the caching strategies employed. As far as I know, the cache is also being choked at some point to preserve memory, and we're back to the nasty n*x+1 queries.
Are there any plans to improve this at all, or is this viewed as a really performant and well architected solution?
Interesting. I do understand why Umbraco chose not to use an "Enterprise" ORM (let's be candid - Entity Framework) at the time - it was hard to control, and didn't really target multiple database platforms. PetaPoco seemed like a good bet.
But I think things have changed - EF6 is now extremely good. It can generate extremely efficient queries - often better than any human could. With code first it's very clean - you can use POCOs and there's no horrible XML mapping files to worry about. It has some very clever strategies for dealing with inheritance. It supports async queries. And it has, of course, code migrations too. It's upgradeable via NuGet, and going forward will support .NET Core.
And, yeah, it doesn't support MySQL - but let's be honest, who uses MySQL with .NET? SQL Express fills the free slot and is arguably superior. And you can use LocalDB to, erm, run your DB locally.
Whilst it would be a big change - and something only for v8 - it would be worth considering (with perhaps a simpler DB schema too :p).
Haha, I was deliberately avoiding naming ORMs here. Could've just as well mentioned NHibernate, and possibly others I don't know about. I hope the thread can continue without choosing one. ;)
My hope is that these kinds of queries / eager loading strategies can hydrate the entities without repeating queries for each entity using PetaPoco or NPoco when it takes over. (afaik, that's the plan)
(although I'm in the EF camp, but hush!)
Here's some input on how to do eager loading strategies with the repos. Examples (mostly java/hibernate) can be found by googling eager loading strategy.
http://udidahan.com/2007/03/06/better-domain-driven-design-implementation/
Hi Lars,
I actually reported a issue about this a while ago : http://issues.umbraco.org/issue/U4-8942
I solved it in my project using my own sql query to fetch relations to ensure performance, because I didn't have time to look into yet. And this was a quick performance boost.
I only looked at the relations service but I can imagine similar bottlenecks are present in other services as well.
Dave
Yep. I hope this thread can help finding the right direction for repos in v8.
At least something will be done for 7.5.5. ;) http://issues.umbraco.org/issue/U4-9077
is working on a reply...