Query language for Umbraco - what would be most valuable?
Hi everyone,
I've started a pet project exploring the possibilities of creating a natural language query syntax for Umbraco. Currently it does super simple building of IEnumerable funcs, but I'm hoping to make "providers" for lucene or even petapoco too.
It feels like an extreme challenge to keep it simple, yet flexible. So I'm hoping to get some useful feedback from the community on whether it'll be useful at all, and secondly, what it should be able to do. :)
Currently I can go
someDocTypeAlias
and I'll get back a Func<IEnumerable<IPublishedContent>> like so:
I do funcs for now just to keep it simple and test it. Preferably, there would be Lucene Queries built up, or IQueryable expression trees built for PetaPoco. (Or some capable mapper)
But again, language syntax is more than average slim for now, and I really don't want to flesh it out too much until I get a better picture of how it should work.
Boolean and arithmetic operators are "easy" to get in, but I'm struggling with the verbs and adjectives to put in there. Thinking `give me modified x something' and not quite sure about the 'where'. All doors open are open by now.
Feedback extremely welcome. :)
As far as naming goes, I'm stuck picking between "UQL" and "UQLele". ><
What is the overall goal you are trying to achieve with the project? A full conversational interface to Umbraco data? Who is the target group? Cortana? :)
How elaborate queries do you want to support? Which date formats do you want to support? Strict ISO-notation or conversational and referential ones, date and time ranges?
A few quickly made up examples. (Messy notation I know)
"which (entity|node|doctype) was (ACTION) last year on a weekday after lunch"
"who (ACTION) (entity|node|doctype) on 31 January?"
"When will node be unpublished?"
Action could created|deleted|published etc.
More questions than an actual answer I am afraid... :)
Thanks a lot for your input, Frederik. Just the kind I'm after. Although as you say, there's a lot of questions at the moment. :)
My initial goal for this is to provide a simpler query syntax than lucene or xpath. It could be used by editors to specify source for lists of content. For example in the grid. (There's another package I'm working on)
But it could also be used as a more advanced search in sites with a lot of content. And then the language you're sketching above would give a lot of meaning for analysis and auditing.
Again, the scope can grow way to big for this to be useful in the end, and the challenge lies with scoping it correctly. There's also the option of creating several smaller language constructs.
For queries, I'm not sure traversing the hierarchy makes sense for editors. Maybe "children" would make sense, but ancestors and descendants might be too confusing. Hence the current //docType implementation.
At the moment, I'm not sure I should continue this endeavor, but I can't shake the feeling it'll be useful. (Got a few other similar things to work on)
You are welcome. Maybe focus you dev effort on tying together a cots or oss NLP library with your knowledge of Umbracos inner workings?! I looks like there a few .NET based ones available now. (Would have loved those 10 years ago for computational linguistics. :) )
I think what you are attempting is extremely interesting, but also extremely difficult to pull off successfully. I personally wouldn't be interested in using it, since I'm happy with querying Umbraco data using "lambda" syntax - it's precise and works and is line with how I query data in other projects.
If your audience is non-technical users then I'd suggest you will ultimately require support from Core team and it will need to go into Core. Why? Because it's unlikely non-technical users will find it or download / install it. For it to succeed you'd need great documentation, great awareness and probably low-level support within Umbraco - all best served by collaborating with Core.
If you're looking for inspiration maybe LinqTo Sql / EF query syntax would be a good way to look at this? I believe that's all expression tree based, but shows a way that a more natural language syntax can be parsed into "lambdas".
Example: Imagine you have a NewsItem doctype that sits under a News doctype and you want to get the items created today:
var todaysNews = from newsitem in news
where newsitem.date is today
select newsitem order by datecreated;
It's probably less "natural" than you might like, but more extensible using something that already works in the real world.
The trouble is that what seems "natural" for a simple query can become a convoluted nightmare for a complex query. The "natural" way can be more difficult to get right than the conventional. And you also need to consider things like language grammar (which varies from culture to culture) and also how could you enable "intellisense" etc. which is a great tool for helping beginners?
I think it's a tall ask. But I can certainly see it's an interesting intellectual exercise that will further your understanding of the language. I guess the question for you is that if it fails will you be happy simply because you have learned something new?
As ever, Lars-Erik, you amaze me with your technical skill and your concern for Umbraco users who aren't as knowledgeable as you are. Tip-top marks for giving this a go!!
I am, however, concerned about how this idea of querying could end up confusing users rather than improving their experience with Umbraco. As you know, there are 3 ways to write Razor code in Umbraco (ancient dynamics, current dynamic, strongly typed) and which you see in response to forum questions (sometimes even a mix of them!) is unfortunately a disservice to users. True, many responders will use the Razor flavour best for the OP but that certainly isn't always the case and searching for solutions is definitely confusing for less-technical users. That's why, as I understand it, the HQ are betting on Models Builder and Property Editor Converters to move to a single way of using Razor that will work both for those who want strongly typed in VS and those who will do simpler code directly in the back office. @Model.Content.BodyText and all that.
Adding a natural language query facility would re-introduce the confusion and difficulty that you and I and the HQ are wanting to avoid.
Rather than a new query language, could I recommend some love be given to the existing Query Builder that is on the toolbar of Partial Views in the back office? It is meant to handle the kind of situation you are proposing as use cases.
What kind of content are you looking for?
Where will you find it in the content tree?
Filter to only include certain items that meet your criteria
Sorting options
It could use some UI love to follow the design style of the rest of the back office. It might benefit from a more wizard or step-by-step mode to hold user's hands better, with help and feedback along the way. And then a simple way to either paste in the selection query or a basic @foreach(var item in selection).
And of course it should generate new razor syntax rather than the current dynamics approach. And with the new razor and models builder code the chance to make more complex 'where' clauses without resorting to linq statements might also be possible.
One of the great things about the Query Builder is that you can see a sample of results (and if the query is slow) to be sure you get exactly the query you want before you even run it in your partial view.
What do you think? I don't want to suggest killing a good project but I wonder if a tool to help users learn the rudiments of querying Umbraco content with razor using the Query Builder (or its enhancement/replacement) might be a more useful approach for users and those who help them online.
first thing that worries me is perfs. People are going to use it to build sites and eg run a few queries on each page and perfs might be horrible. Sure, being able to ask for "last 3 modified content of type newsItem" is great but compiling and processing that query is going to be expensive.
UX-wise, the total n00b will want the back-office query builder, while the El1t3 will want the raw Linq. I am thinking it might introduce an in-between solution with little use?
That being said: querying is a complex beast and I am not happy (enough) with what we can do today. And obviously you neither, else you would not be looking for solutions.
I think I believe that the proper way to do it is have a nice(r) Linq solution + easy(ier) caching mechanism + faster cache (v8) + a query builder UI to make things even easier.
Oh and one thing I'd love to see is a list of the, say, 50 most frequent queries ppl need. Top 3 news items? Menu tree? All products?
Thanks a lot for your input everyone. #h5yr!
I hope I managed to address all questions, concerns and ideas below. My mind is kind of spinning, though. :)
The what
I think I'll have to clarify my original intent with this. What I'm trying to do is create an editor-friendly language to use within document content "settings". Maybe a property editor or a grid control showing the results of a simple query.
So to address the level of the language, it would be above what a front-end dev would do. It's supposed to work and be comprehensible for editors and users.
The analysis parts like @Frederik mentions would be another layer for a later or "bigger" version. I'm not sure I'm ready to go there.
The first place I'd use the language myself is within this grid-editor I hope to release soon: https://github.com/lars-erik/Our.Umbraco.ContentList (see screenshot)
We've got "children of this", "xpath query" and "lucene query" as sources. Editors can use the "children" one, but I'd rather hide the last two. A natural language query on the other hand... (1atest 6 news)
I'm using Ace on the client side (which is on it's way to Umbraco). It has some fairly reasonable support for "intellisense", so showing doctypes and possibly fields is feasible. Keywords are a given.
UaaQB (UQLele as a Query Builder)
Reg. the query builder. I should really have been aware of it, but somehow I had completely forgot about it. I don't find myself in the template editor that often. (Go figure)
It's actually doing something quite like what I'm attempting. I've built several like that before, but they're always limited in some way or the other. I guess I never got the architecture quite right. (Got a really nice one for Lucene)
Yet, it is still pretty limited, and the work to extend the UI is in some ways even more than creating a language for it. (It is!) The language could well be an "advanced" mode for the query builder, and the abstract syntax tree (AST) would map easily to a more full featured ui. The UI could more or less map directly to a parse tree, or even something to build the AST right off of. (I'm sorry, @Douglas, can't say that in other terms. Bottom line: synergies.)
More technical
The current LINQ translation is just a POC. I'd go with compilation to expression trees and/or lucene queries in a "finished" product. Having a LINQ Expression and going .ToString() on it actually yields the right query. So there's the query builder for you. :) It could also be provider based. Meaning you could translate it to either a Lucene, a ContentCache, NPoco or even an Entity Framework query. It's first parsed to a "parse-tree", then compiled into an "abstract syntax tree". The latter nodes are then visited to build queries, or executed like in UFX.
Performance
The compiled queries can easily be cached and looked up after first use. As mentioned, I'd go with expressions or lucene queries. Compiled expressions or final lucene queries would be the compiled ones, preferably working against a fast cache.
Current thoughts
After the initial three hours I spent getting "latest 5 news" out of my mind, I stared blankly on my screen and had no idea where to go from there. So I realised I needed scope and direction. You guys definitely helped me get a bigger picture. I think, given continued motivation, I'll have a look at the new query builder (temp-template-editor branch) and see if I could do an "advanced" version. Building a "dropdown based" UI is just a matter of persistence after the language works.
Compiling a list of the 50 most frequent queries people use and then destilling it to what an editor would understand and use would really be helpful. I'll see about tossing up a small survey.
I'm still not quite confident with the direction, but at least the scope gets smaller and more focused.
New BNF
If you guys'll bear some ENBNF (Erroneous Noob Backus–Naur form), here's a draft of a second attempt:
cache.GetByRoute("/home/blog")
.Children("blogPost")
.Where(p =>
p.GetProperty("content").Value.Contains("umbraco") // case sensitivity etc.
|| p.GetProperty("you get the picture")
);
A workbench like the one I did with UFX could defo. show the results and timings. Cache aside. :P
Thanks again for feedback so far guys. Hope to keep your interest, or even collab. ;)
Query language for Umbraco - what would be most valuable?
Hi everyone,
I've started a pet project exploring the possibilities of creating a natural language query syntax for Umbraco. Currently it does super simple building of IEnumerable funcs, but I'm hoping to make "providers" for lucene or even petapoco too.
It feels like an extreme challenge to keep it simple, yet flexible. So I'm hoping to get some useful feedback from the community on whether it'll be useful at all, and secondly, what it should be able to do. :)
Currently I can go
and I'll get back a
Func<IEnumerable<IPublishedContent>>
like so:There's also support for limitation and ordering descending by date:
yields
I do funcs for now just to keep it simple and test it. Preferably, there would be Lucene Queries built up, or IQueryable expression trees built for PetaPoco. (Or some capable mapper)
But again, language syntax is more than average slim for now, and I really don't want to flesh it out too much until I get a better picture of how it should work.
A formal BNF currently looks about so:
Boolean and arithmetic operators are "easy" to get in, but I'm struggling with the verbs and adjectives to put in there. Thinking `give me modified x something' and not quite sure about the 'where'. All doors open are open by now.
Feedback extremely welcome. :)
As far as naming goes, I'm stuck picking between "UQL" and "UQLele". ><
Code here: https://github.com/lars-erik/our-umbraco-query-language
Some bird on twitter just told me I should underline that this is aimed at non-developers. It's not supposed to be another LINQ to whatever.
What is the overall goal you are trying to achieve with the project? A full conversational interface to Umbraco data? Who is the target group? Cortana? :)
How elaborate queries do you want to support? Which date formats do you want to support? Strict ISO-notation or conversational and referential ones, date and time ranges?
A few quickly made up examples. (Messy notation I know)
"which (entity|node|doctype) was (ACTION) last year on a weekday after lunch" "who (ACTION) (entity|node|doctype) on 31 January?" "When will node be unpublished?"
Action could created|deleted|published etc.
More questions than an actual answer I am afraid... :)
Just found this. Maybe you can find some inspiration there as well? http://nlp.abodit.com/
Thanks a lot for your input, Frederik. Just the kind I'm after. Although as you say, there's a lot of questions at the moment. :)
My initial goal for this is to provide a simpler query syntax than lucene or xpath. It could be used by editors to specify source for lists of content. For example in the grid. (There's another package I'm working on)
But it could also be used as a more advanced search in sites with a lot of content. And then the language you're sketching above would give a lot of meaning for analysis and auditing.
Again, the scope can grow way to big for this to be useful in the end, and the challenge lies with scoping it correctly. There's also the option of creating several smaller language constructs.
For queries, I'm not sure traversing the hierarchy makes sense for editors. Maybe "children" would make sense, but ancestors and descendants might be too confusing. Hence the current //docType implementation.
At the moment, I'm not sure I should continue this endeavor, but I can't shake the feeling it'll be useful. (Got a few other similar things to work on)
You are welcome. Maybe focus you dev effort on tying together a cots or oss NLP library with your knowledge of Umbracos inner workings?! I looks like there a few .NET based ones available now. (Would have loved those 10 years ago for computational linguistics. :) )
I'm using Irony. Ref. Umbraco Forms Expressions. Same stuff, new wrapping.
I think what you are attempting is extremely interesting, but also extremely difficult to pull off successfully. I personally wouldn't be interested in using it, since I'm happy with querying Umbraco data using "lambda" syntax - it's precise and works and is line with how I query data in other projects.
If your audience is non-technical users then I'd suggest you will ultimately require support from Core team and it will need to go into Core. Why? Because it's unlikely non-technical users will find it or download / install it. For it to succeed you'd need great documentation, great awareness and probably low-level support within Umbraco - all best served by collaborating with Core.
If you're looking for inspiration maybe LinqTo Sql / EF query syntax would be a good way to look at this? I believe that's all expression tree based, but shows a way that a more natural language syntax can be parsed into "lambdas".
Example: Imagine you have a NewsItem doctype that sits under a News doctype and you want to get the items created today:
It's probably less "natural" than you might like, but more extensible using something that already works in the real world.
The trouble is that what seems "natural" for a simple query can become a convoluted nightmare for a complex query. The "natural" way can be more difficult to get right than the conventional. And you also need to consider things like language grammar (which varies from culture to culture) and also how could you enable "intellisense" etc. which is a great tool for helping beginners?
I think it's a tall ask. But I can certainly see it's an interesting intellectual exercise that will further your understanding of the language. I guess the question for you is that if it fails will you be happy simply because you have learned something new?
As ever, Lars-Erik, you amaze me with your technical skill and your concern for Umbraco users who aren't as knowledgeable as you are. Tip-top marks for giving this a go!!
I am, however, concerned about how this idea of querying could end up confusing users rather than improving their experience with Umbraco. As you know, there are 3 ways to write Razor code in Umbraco (ancient dynamics, current dynamic, strongly typed) and which you see in response to forum questions (sometimes even a mix of them!) is unfortunately a disservice to users. True, many responders will use the Razor flavour best for the OP but that certainly isn't always the case and searching for solutions is definitely confusing for less-technical users. That's why, as I understand it, the HQ are betting on Models Builder and Property Editor Converters to move to a single way of using Razor that will work both for those who want strongly typed in VS and those who will do simpler code directly in the back office.
@Model.Content.BodyText
and all that.Adding a natural language query facility would re-introduce the confusion and difficulty that you and I and the HQ are wanting to avoid.
Rather than a new query language, could I recommend some love be given to the existing Query Builder that is on the toolbar of Partial Views in the back office? It is meant to handle the kind of situation you are proposing as use cases.
It could use some UI love to follow the design style of the rest of the back office. It might benefit from a more wizard or step-by-step mode to hold user's hands better, with help and feedback along the way. And then a simple way to either paste in the selection query or a basic
@foreach(var item in selection)
.And of course it should generate new razor syntax rather than the current dynamics approach. And with the new razor and models builder code the chance to make more complex 'where' clauses without resorting to linq statements might also be possible.
One of the great things about the Query Builder is that you can see a sample of results (and if the query is slow) to be sure you get exactly the query you want before you even run it in your partial view.
What do you think? I don't want to suggest killing a good project but I wonder if a tool to help users learn the rudiments of querying Umbraco content with razor using the Query Builder (or its enhancement/replacement) might be a more useful approach for users and those who help them online.
What do you think?
cheers,
doug.
Not wanting to rain on the parade but...
first thing that worries me is perfs. People are going to use it to build sites and eg run a few queries on each page and perfs might be horrible. Sure, being able to ask for "last 3 modified content of type newsItem" is great but compiling and processing that query is going to be expensive.
UX-wise, the total n00b will want the back-office query builder, while the El1t3 will want the raw Linq. I am thinking it might introduce an in-between solution with little use?
That being said: querying is a complex beast and I am not happy (enough) with what we can do today. And obviously you neither, else you would not be looking for solutions.
I think I believe that the proper way to do it is have a nice(r) Linq solution + easy(ier) caching mechanism + faster cache (v8) + a query builder UI to make things even easier.
Oh and one thing I'd love to see is a list of the, say, 50 most frequent queries ppl need. Top 3 news items? Menu tree? All products?
Thoughts?
Thanks a lot for your input everyone. #h5yr!
I hope I managed to address all questions, concerns and ideas below. My mind is kind of spinning, though. :)
The what
I think I'll have to clarify my original intent with this. What I'm trying to do is create an editor-friendly language to use within document content "settings". Maybe a property editor or a grid control showing the results of a simple query.
So to address the level of the language, it would be above what a front-end dev would do. It's supposed to work and be comprehensible for editors and users.
The analysis parts like @Frederik mentions would be another layer for a later or "bigger" version. I'm not sure I'm ready to go there.
The first place I'd use the language myself is within this grid-editor I hope to release soon: https://github.com/lars-erik/Our.Umbraco.ContentList (see screenshot)
We've got "children of this", "xpath query" and "lucene query" as sources. Editors can use the "children" one, but I'd rather hide the last two. A natural language query on the other hand... (1atest 6 news)
I'm using Ace on the client side (which is on it's way to Umbraco). It has some fairly reasonable support for "intellisense", so showing doctypes and possibly fields is feasible. Keywords are a given.
UaaQB (UQLele as a Query Builder)
Reg. the query builder. I should really have been aware of it, but somehow I had completely forgot about it. I don't find myself in the template editor that often. (Go figure)
It's actually doing something quite like what I'm attempting. I've built several like that before, but they're always limited in some way or the other. I guess I never got the architecture quite right. (Got a really nice one for Lucene)
Yet, it is still pretty limited, and the work to extend the UI is in some ways even more than creating a language for it. (It is!) The language could well be an "advanced" mode for the query builder, and the abstract syntax tree (AST) would map easily to a more full featured ui. The UI could more or less map directly to a parse tree, or even something to build the AST right off of. (I'm sorry, @Douglas, can't say that in other terms. Bottom line: synergies.)
More technical
The current LINQ translation is just a POC. I'd go with compilation to expression trees and/or lucene queries in a "finished" product. Having a LINQ Expression and going .ToString() on it actually yields the right query. So there's the query builder for you. :) It could also be provider based. Meaning you could translate it to either a Lucene, a ContentCache, NPoco or even an Entity Framework query. It's first parsed to a "parse-tree", then compiled into an "abstract syntax tree". The latter nodes are then visited to build queries, or executed like in UFX.
Performance
The compiled queries can easily be cached and looked up after first use. As mentioned, I'd go with expressions or lucene queries. Compiled expressions or final lucene queries would be the compiled ones, preferably working against a fast cache.
Current thoughts
After the initial three hours I spent getting "latest 5 news" out of my mind, I stared blankly on my screen and had no idea where to go from there. So I realised I needed scope and direction. You guys definitely helped me get a bigger picture. I think, given continued motivation, I'll have a look at the new query builder (temp-template-editor branch) and see if I could do an "advanced" version. Building a "dropdown based" UI is just a matter of persistence after the language works.
Compiling a list of the 50 most frequent queries people use and then destilling it to what an editor would understand and use would really be helpful. I'll see about tossing up a small survey.
I'm still not quite confident with the direction, but at least the scope gets smaller and more focused.
New BNF
If you guys'll bear some ENBNF (Erroneous Noob Backus–Naur form), here's a draft of a second attempt:
It'd be invariant culture for starters, but those are far off details.
With it, we could go
It'd be translated to a lucene query like
(forgive the left-outs and errors)
And an expression representing the LINQ query
A workbench like the one I did with UFX could defo. show the results and timings. Cache aside. :P
Thanks again for feedback so far guys. Hope to keep your interest, or even collab. ;)
is working on a reply...