I've attached a SQL Profiler and I think the reason why is that it is executing a query to get the property value for every single record (~543) in the existingOrgs collection. existingOrgs is of type IEnumerable<IContent>.
What I am doing is attempting to synchronise content between SalesForce and Umbraco, I can't rely on cached content so need to use the ContentService and since that is for CRUD operations that is what I am using. How can I perform the same query but in a much faster and more efficient way - am I missing something completely obvious?
In case it makes a difference the site is running v6.2.1.
I'm not sure how existingOrgs is populated, but assuming it's from a LINQ expression and hasn't been evaluated yet?
With the FirstOrDefault lambda expression - it will be evaluating each item in existingOrgs - so if Umbraco internals are doing multiple SQL calls, hence the performance delay you're experiencing.
Personally, I'd go with swapping it out for a direct SQL call. Here's a snippet that might work...
SELECT TOP 1 n.id
FROM
cmsPropertyData AS pd
INNER JOIN umbracoNode AS n ON n.id = pd.contentNodeId
INNER JOIN cmsDocument AS d ON n.id = d.nodeId
INNER JOIN cmsPropertyType AS t ON t.id = pd.propertytypeid
WHERE
d.newest = 1 AND d.versionId = pd.versionId
AND t.Alias = 'accountId'
AND (pd.dataNvarchar = '1234' OR pd.dataNtext = '1234')
;
Obviously modify this as you need it - replace the '1234' with the actual account.ParentId. Then once you've got the nodeId back from the SQL call, you can get the IContent node itself from the ContentService API.
Thanks Lee, you are correct that it is populated from a unevaluated LINQ expression. I had previously forced the query to evaluate to a list but it didn't really offer much improvement.
I'm a little cautious about using hard coded SQL queries and wondering if I might be better to attempt to use Examine queries on an index that includes unpublished content. I'm going to give that a go first and then if I have little or no success will attempt to implement your solution as my fallback. Unfortunately there are a few instances of where this sort of process is required during the synchronisation of the content - it's all far too slow to be of any use right now and even testing it is massively impacting my time.
I was curious about the direct SQL approach... just had to scratch that developer itch!
Here's a prototype extension method:
public static class ContentServiceExtensions
{
public static IEnumerable<IContent> GetContentByPropertyValue(this IContentService service, string alias, object value)
{
var db = ApplicationContext.Current.DatabaseContext.Database;
var sql = "SELECT n.id FROM cmsPropertyData AS pd INNER JOIN umbracoNode AS n ON n.id = pd.contentNodeId INNER JOIN cmsDocument AS d ON n.id = d.nodeId INNER JOIN cmsPropertyType AS t ON t.id = pd.propertytypeid WHERE d.newest = 1 AND d.versionId = pd.versionId AND t.Alias = @0 AND (pd.dataNvarchar = @1 OR pd.dataNtext = @1);";
var ids = db.Query<int>(sql, alias, value);
return service.GetByIds(ids);
}
}
Thanks Lee, I can actually see this being a potentially useful little extension method to add to the arsenal even if I don't end up using it to solve this particular issue so thanks for the effort!
I'm still fighting with this today and will update the thread when resolved. Ran into a couple of issues and so have wiped out my cache and indexes for the site and due to the size of the site it is still rebuilding! I really need to invest in some new hardware in the new year - my laptop is ok spec but is just not able to keep up again (seems to happen every couple of years).
Will keep you posted - it's turning into a bit of a hybrid solution :)
3 hours, yes 3 hours it took yesterday to rebuild the indexes on the site with my machine running at 100% CPU for most of that time!!! That's not right!?
Anyhow, the process is now much faster using the Internal Examine index to check for the existence of nodes before creating new ones. It's still far from quick for the whole process to complete but much faster than it was before. I will be running this on the live server tonight and so hopefully that has more grunt in it and will complete in a reasonable amount of time. I'm also going to migrate it to a custom data adapter for CMS Import so they can use that for executing or scheduling the process.
Content Service Performance Incredibly Slow when querying on property value
Take the following query which runs for a ridiculously long time, I mean something in the region of ~20 minutes or so!
I've attached a SQL Profiler and I think the reason why is that it is executing a query to get the property value for every single record (~543) in the
existingOrgs
collection.existingOrgs
is of typeIEnumerable<IContent>
.What I am doing is attempting to synchronise content between SalesForce and Umbraco, I can't rely on cached content so need to use the ContentService and since that is for CRUD operations that is what I am using. How can I perform the same query but in a much faster and more efficient way - am I missing something completely obvious?
In case it makes a difference the site is running v6.2.1.
Thanks, Simon
Hi Simon,
I'm not sure how
existingOrgs
is populated, but assuming it's from a LINQ expression and hasn't been evaluated yet?With the
FirstOrDefault
lambda expression - it will be evaluating each item inexistingOrgs
- so if Umbraco internals are doing multiple SQL calls, hence the performance delay you're experiencing.Personally, I'd go with swapping it out for a direct SQL call. Here's a snippet that might work...
Modified from my original Gist #5300219
Obviously modify this as you need it - replace the
'1234'
with the actualaccount.ParentId
. Then once you've got the nodeId back from the SQL call, you can get theIContent
node itself from theContentService
API.I hope this helps.
Cheers,
- Lee
Thanks Lee, you are correct that it is populated from a unevaluated LINQ expression. I had previously forced the query to evaluate to a list but it didn't really offer much improvement.
I'm a little cautious about using hard coded SQL queries and wondering if I might be better to attempt to use Examine queries on an index that includes unpublished content. I'm going to give that a go first and then if I have little or no success will attempt to implement your solution as my fallback. Unfortunately there are a few instances of where this sort of process is required during the synchronisation of the content - it's all far too slow to be of any use right now and even testing it is massively impacting my time.
Cheers, SImon
Examine queries on unpublished content would be super-fast!
I was curious about the direct SQL approach... just had to scratch that developer itch!
Here's a prototype extension method:
Cheers,
- Lee
Thanks Lee, I can actually see this being a potentially useful little extension method to add to the arsenal even if I don't end up using it to solve this particular issue so thanks for the effort!
Cheers, Si
No problem at all, I've already added it to my own utility belt :-)
Cheers,
- Lee
I'm still fighting with this today and will update the thread when resolved. Ran into a couple of issues and so have wiped out my cache and indexes for the site and due to the size of the site it is still rebuilding! I really need to invest in some new hardware in the new year - my laptop is ok spec but is just not able to keep up again (seems to happen every couple of years).
Will keep you posted - it's turning into a bit of a hybrid solution :)
Hey Lee,
3 hours, yes 3 hours it took yesterday to rebuild the indexes on the site with my machine running at 100% CPU for most of that time!!! That's not right!?
Anyhow, the process is now much faster using the Internal Examine index to check for the existence of nodes before creating new ones. It's still far from quick for the whole process to complete but much faster than it was before. I will be running this on the live server tonight and so hopefully that has more grunt in it and will complete in a reasonable amount of time. I'm also going to migrate it to a custom data adapter for CMS Import so they can use that for executing or scheduling the process.
Cheers, Si
is working on a reply...