Is updating the 301 redirect to evolve to full rewrite engine a good idea?
Hi all,
recently there has been a twitter conversation about upgrading the new Umbraco 301 tracking facility into a fully URL Rewriting facility. I can understand why folks would think this is a good idea but I would urge everyone to consider a few things...
Firstly, there are a ton of rewrite types. The 301 logic in Umbraco handles only static rewrites. This is a 1:1 mapping, this does not contain any logic for wildcard matching, different status types, different schemas (forcing to https or http), rewrite vs redirect, etc...
The wheel for all of this has already been invented (many times now) and we don't need to re-invent this. This logic is happily baked into IIS Rewrite rules, it's supported by Microsoft and works very very well. These rules are also baked into the IIS request engine which means they occur before the request even hits ASP.Net's and Umbraco's pipeline, so by and large, it will be faster than handling this at the Umbraco request pipeline level.
That said, there is some potential to upgrading Umbraco's 301 engine to work for static rewrites (1:1). But the primary concerns of doing this is:
People will then ask for more features such as all of the ones I mentioned above which is something we will not support in Umbraco core - because this is already invented and working well elsewhere. We don't need to add more complexity to the core to support and maintain logic that exists elsewhere.
This functionality already exists with IIS rewrite rules and will be faster - how much faster, I don't know but I guarantee you that having this execute at the Umbraco request pipeline level will be much slower than IIS rewrites since this rule will only apply to content not found, which means it must attempt to find the content first!
If people would want the Umbraco static rewrites to execute before content is attempted to be found - then it needs to be overhauled and changed and then we'd need to modify the db to support this type of static mapping to flag before/after.
So, one proposal is to create an IIS rewrite rule UI that folks can use to manage static mappings. Someone asked on Twitter:
so, should'nt the 301 feature have been implemented like that instead?
No... as per above, the 301 executes if no content is found so this would not work for the 301 engine since IIS rewrite rules will execute before Umbraco even knows a request is executing.
One concern with editing IIS rewrite rules was that if you add a rule, it will restart IIS. My concern would be that your web.config would become huge. This is why using an IIS Rewrite Static Map file should be used instead. We do this on Our actually right now. It's an external file that your web.config IIS rules point to which contain any number of static maps. Since external rewrite maps use the <rewriteMaps configSource="rewritemaps.config"><rewriteMaps> syntax it means the external file change will still cause a restart unless we tell the web.config to not monitor external changes.
The IIS Rewrite map can be updated without causing an app pool recycle, but rules will not be picked up until a recycle happens (there's nothing in web.config that says: ignore this file and don't recycle the app pool)
We need to test for the existence of the IIS Rewrite module if we are going to allow people to add manual rewrite to an IIS Rewrite map, else you will get a YSOD when IIS Rewrite is not installed in IIS.
Number 2 is another reason why we can't implement this feature completely as an IIS Rewrite thing: not all servers have it installed. The current implementation works for everyone regardless of their IIS plugins.
We have been discussing this a lot internally as we need a editor UI to create redirects backed by a performant solution that works in Azure.
We have been considering a Rewrite Provider as the likely best approach as it would bring the best of all worlds, IIS performance but still the ability to store what/how we like in the DB (of course with some caching within the rewrite provider)
Through our research we also came across UrlRewrite.Net which is a very interesting project that is in itself a IIS Module (not a HTTP module) but with some interesting additional features such as being able to "Register your own .Net classes as conditions, actions or operations allowing you to include complex business logic in your rewriter rules."
Thinking further ahead (.NET core) we also came across Owin.UrlRewrite which unfortunately doesn't look to be a very active project anymore and I'm not sure how performant it would be with IIS as things stand now (still interesting to consider though)...
Looks good. Especially the part: "why use this" mentioning "Order of magnitude faster than the standard Microsoft rewriter module because it is written to be very efficient".
Another issue with this is that the IIS rewrite engine is kind of fragile. Add two rewrite rules with the same name, and your site will return a lovely blank white screen. Same if you add two identical static rewrite maps. If the combined size of your web.config and rewritemaps files hits the default limit for config files, same thing, and you need to make some registry changes to get it working beyond that point.
As Jeavon mentioned, you can write custom providers for the rewriter, which might be an option?
I seem to recall that we landed in some trouble when adding too many redirect rules to an IIS Rewrite file. Even if that file was an external one, and not all rules were in the web.config.
As some others mentioned, it might not be the perfect soultion to rely on IIS Rewrite being installed on the web server. Could this be implemented as a separate package? So you could install it if you have IIS Rewrite, and roll your own solution if you don't? Is there any way to detect if the module is installed? Problem is, if you add the config, and the module is not there, then I think the site will just crash because of an invalid web.config?
That's correct. I can attest from a number of times running into this problem that the error message you get is particularly uninformative for diagnosing it too!
In addition to many of the pain points Tim above has mentioned, you have to implement a pretty funky hack in order to avoid 301 chaining. In addition, debugging and reasoning with your ruleset as a developer is painful and frustrating.
I wrote a POC for an alternative URL rewriting tool I wrote around a year ago to try and solve some of the deficiencies detailed above, available here. It provided a fluent configuration interface that would allow a developer to specify a list of rules that would run in a particular order, including .CSX support (meaning it could be defined in a file and changed without a full project recompile/deployment). Have a look at some of the tests to get a better picture of some of my ideas.
At the retreat in 2015 I raised the need for there to be a simple 301 built into core. I think its important to re-iterate why it should be in core and what was trying to be achieved.
The main problem I wanted to fix was this:
-- start of problem --
When renaming or moving a node in the back office you create a new url to that node (as the url is generated from the node name and not a unique identifier such as an id, this is on purpose so its nicer on the eye and adds some SEO goodness). As a result any existing hard coded links to that node are now broken and infact the whole history that that link ever existed is gone forever :(
I thought it quite reasonable that out of the box a content management system should be able to track when you actually manage your content by remembering what you used to call it and allow old links (often out of your control on other sites) to keep working.
Ideally users/editors shouldn't need to worry about this happening, it should "just work"...simply because this was 2015 and what crazy ass CMS wouldn't be doing something like this by now???
-- end of problem --
As far as I'm concerned the existing solution does just that. It tracks changes in content and leaves a history of url changes that allows it to find you the right content if you've moved/renamed it.
As a bonus its given us a nice table to "possible" add other bits into in the future. However it is not a complete redirect rule engine but nor should it be. There are packages which handle managing large amounts of redirects but the discussion here seems to be should this be done in core?
The world of redirects can get real messy real quick so its important to know what problems the core 301 was trying to fix and which ones it was not.
Renaming/moving content to keep urls delivering content - Yep
Migrating an old site to a new site so needing to map 1000's of urls from old to new - Nope, use IIS for that or other
Setting up short links for promotions in news papers etc - Possibly, better to use a url alias (built in already) or even a macro or IIS Rewrite again
Control http > https etc. - Nope, IIS here, this is server setup stuff and not the job of the CMS
Control a conical domain - Nope, IIS again or SEO Checker
Something crazy that you are cooking up - Nope, the clue is in the name, its crazy...you are cooking it up...
Regarding those use cases I think its fair to say that the 301 does what it was intended. To expand on it too much more would take it way beyond where it was meant to be possibly needlessly, there are other ways to do it.
We should really let Core focus on other Core features unless someone can tell me what this feature should be doing that I'm missing? I don't think we should beat it to do what it wasn't supposed to do.
I think this is spot on. Complex redirection does not have to be the responsibility of the CMS. Url change tracking on the other hand makes perfect sense.
Packages can solve any other needs, without interfering with the core CMS.
While the current implementation does exactly as Pete says above, it's got such a nice UI that admins are bound to ask "why can't I add here".
I really don't fancy repeating all of the above reasons to explain why this is a "CMS internal 301 feature only".
I also really don't like having to have someone who can navigate an XML file, not to mention figure out which form to use when adding redirects to urlrewriting.config.
I definitely don't enjoy spending my own time on it.
But I haven't had a chance to try out the 301 package, so will do that.
I appreciate that IIS will do this quicker than Umbraco. But I'm sure savvy and observant devs will start adding to the built-in 301 table. It's probably also just a question of time before someone actually creates a package to add stuff there.
Honestly, even with all the reasons provided, I can't see why the current core implementation shouldn't allow that. For most sites it won't be a performance issue. And when it becomes a performance issue, I'm sure there's more things helping to drag it down. If well documented, maybe even in the UI, everyone would know it has its limits, and option B, C and D is better when you reach a certain limit.
Did anyone profile the existing solution on a site with 10k+ articles where 40% of them were renamed during authoring or QA? Did it mess everything up and have to be replaced? Did it actually run swell? Are we making storms in teacups?
As to prevent too much bike shedding on this one. We essentially have a few options:
We implement static rewrites within Umbraco utilizing the current strategy that the 301 redirect logic uses and UI. This would mean:
It must be very clear that this engine will never support complex redirects or rewrites. It will purely do 1:1 redirects and that is all
There is then 2 options:
Continue using the current logic - this means that any static redirects will not execute if another Umbraco document matches the URL because the redirect logic occurs when no content is found
Or ... implement a new switch that will perform the static lookup and redirect before any content matching occurs. To do this there is some work to be done:
The database will need to be updated to have an additional column in the redirect table with a "redirectType" (or similar), this could be an integer for speedier lookups.
The 2 redirect types we'll have (for now) will be: Not Found and Static Rewrite. The current 301 redirect content finder will only look for Not Found types.
A new IContentFinder would need to be created to query for Static Rewrite types and this content finder would need to be the first finder in the list. It will make a db query (and cache the result) to determine if a static rewrite matches the URL and if so it will perform the redirect
If people would like a UI for editing complex redirects/rewrites, etc... then it will be up to the community to build a package to deal with this either utilizing IIS Rewrites or 3rd party libraries
Do nothing in core and leave all of this up to package devs because the core cannot support all of the flexibility and requirements in one place for all redirect/rewrite stuff.
So it comes down to exactly what people want Umbraco core to do. If it's just 1:1 static redirects then sure, we can put that task as Up-for-grabs and integrate it into a 7.x minor release.
While it would be nice to be able to add 1:1 redirects in the back office, we typically implement a bunch 301s when we launch a redesigned site using URL Rewrite. The current functionality of 7.5 complements this as it handles changes after the redesign is launched. Once and a while we miss a few and need to add some and some projects have more complex rewrite rules (I recently made a custom module to handle a specific requirement). But for the most part, I think it works fine as is and I would vote to keep anything too complex out of core.
I also think it would be possible for someone to extend the IIS Rewrite module with something database driven and integrated with a backoffice UI. But that should be a separate project, IMHO.
I'd vote for for option 1.b above (allowing 1:1 static redirects, which would execute as the first content finder in the list). It seems a shame to have all the current 301 redirect management goodness in the core without adding this simple feature.
A custom solution should be employed for anything more complex; be it IIS, http module or package. NB I used the Simple 301 package in a recent site, which supports Regex-based pattern matching as well as redirect loop detection. It is well written and caches the redirects collection in memory.
Is updating the 301 redirect to evolve to full rewrite engine a good idea?
Hi all,
recently there has been a twitter conversation about upgrading the new Umbraco 301 tracking facility into a fully URL Rewriting facility. I can understand why folks would think this is a good idea but I would urge everyone to consider a few things...
Firstly, there are a ton of rewrite types. The 301 logic in Umbraco handles only static rewrites. This is a 1:1 mapping, this does not contain any logic for wildcard matching, different status types, different schemas (forcing to https or http), rewrite vs redirect, etc...
The wheel for all of this has already been invented (many times now) and we don't need to re-invent this. This logic is happily baked into IIS Rewrite rules, it's supported by Microsoft and works very very well. These rules are also baked into the IIS request engine which means they occur before the request even hits ASP.Net's and Umbraco's pipeline, so by and large, it will be faster than handling this at the Umbraco request pipeline level.
That said, there is some potential to upgrading Umbraco's 301 engine to work for static rewrites (1:1). But the primary concerns of doing this is:
So, one proposal is to create an IIS rewrite rule UI that folks can use to manage static mappings. Someone asked on Twitter:
No... as per above, the 301 executes if no content is found so this would not work for the 301 engine since IIS rewrite rules will execute before Umbraco even knows a request is executing.
One concern with editing IIS rewrite rules was that if you add a rule, it will restart IIS. My concern would be that your web.config would become huge. This is why using an IIS Rewrite Static Map file should be used instead. We do this on Our actually right now. It's an external file that your web.config IIS rules point to which contain any number of static maps. Since external rewrite maps use the
<rewriteMaps configSource="rewritemaps.config"><rewriteMaps>
syntax it means the external file change will still cause a restart unless we tell the web.config to not monitor external changes.Here's some details about rewrite maps:
Another alternative is to create a custom IIS Rewrite provider:
Which means we could store values in any location, database, files, etc... and not worry about restarts.
All questions, comments, etc... welcome :)
Two things to note:
Number 2 is another reason why we can't implement this feature completely as an IIS Rewrite thing: not all servers have it installed. The current implementation works for everyone regardless of their IIS plugins.
I'm 90% sure if using UrlRewrite.Net (see below) then you don't have the problem of IIS Rewrite not being installed on the server....?
We have been discussing this a lot internally as we need a editor UI to create redirects backed by a performant solution that works in Azure.
We have been considering a Rewrite Provider as the likely best approach as it would bring the best of all worlds, IIS performance but still the ability to store what/how we like in the DB (of course with some caching within the rewrite provider)
Through our research we also came across UrlRewrite.Net which is a very interesting project that is in itself a IIS Module (not a HTTP module) but with some interesting additional features such as being able to "Register your own .Net classes as conditions, actions or operations allowing you to include complex business logic in your rewriter rules."
Thinking further ahead (.NET core) we also came across Owin.UrlRewrite which unfortunately doesn't look to be a very active project anymore and I'm not sure how performant it would be with IIS as things stand now (still interesting to consider though)...
Looks good. Especially the part: "why use this" mentioning "Order of magnitude faster than the standard Microsoft rewriter module because it is written to be very efficient".
Another issue with this is that the IIS rewrite engine is kind of fragile. Add two rewrite rules with the same name, and your site will return a lovely blank white screen. Same if you add two identical static rewrite maps. If the combined size of your web.config and rewritemaps files hits the default limit for config files, same thing, and you need to make some registry changes to get it working beyond that point.
As Jeavon mentioned, you can write custom providers for the rewriter, which might be an option?
Well it's then wonderful that we can detect this problem before adding rewrites. :-)
I seem to recall that we landed in some trouble when adding too many redirect rules to an IIS Rewrite file. Even if that file was an external one, and not all rules were in the web.config.
As some others mentioned, it might not be the perfect soultion to rely on IIS Rewrite being installed on the web server. Could this be implemented as a separate package? So you could install it if you have IIS Rewrite, and roll your own solution if you don't? Is there any way to detect if the module is installed? Problem is, if you add the config, and the module is not there, then I think the site will just crash because of an invalid web.config?
That's correct. I can attest from a number of times running into this problem that the error message you get is particularly uninformative for diagnosing it too!
Andy
Debugging IIS Rewrite rules is extremely painful, and not for the faint of heart.......
In addition to many of the pain points Tim above has mentioned, you have to implement a pretty funky hack in order to avoid 301 chaining. In addition, debugging and reasoning with your ruleset as a developer is painful and frustrating.
I wrote a POC for an alternative URL rewriting tool I wrote around a year ago to try and solve some of the deficiencies detailed above, available here. It provided a fluent configuration interface that would allow a developer to specify a list of rules that would run in a particular order, including .CSX support (meaning it could be defined in a file and changed without a full project recompile/deployment). Have a look at some of the tests to get a better picture of some of my ideas.
At the retreat in 2015 I raised the need for there to be a simple 301 built into core. I think its important to re-iterate why it should be in core and what was trying to be achieved.
The main problem I wanted to fix was this:
-- start of problem --
When renaming or moving a node in the back office you create a new url to that node (as the url is generated from the node name and not a unique identifier such as an id, this is on purpose so its nicer on the eye and adds some SEO goodness). As a result any existing hard coded links to that node are now broken and infact the whole history that that link ever existed is gone forever :(
I thought it quite reasonable that out of the box a content management system should be able to track when you actually manage your content by remembering what you used to call it and allow old links (often out of your control on other sites) to keep working.
Ideally users/editors shouldn't need to worry about this happening, it should "just work"...simply because this was 2015 and what crazy ass CMS wouldn't be doing something like this by now???
-- end of problem --
As far as I'm concerned the existing solution does just that. It tracks changes in content and leaves a history of url changes that allows it to find you the right content if you've moved/renamed it.
As a bonus its given us a nice table to "possible" add other bits into in the future. However it is not a complete redirect rule engine but nor should it be. There are packages which handle managing large amounts of redirects but the discussion here seems to be should this be done in core?
The world of redirects can get real messy real quick so its important to know what problems the core 301 was trying to fix and which ones it was not.
Regarding those use cases I think its fair to say that the 301 does what it was intended. To expand on it too much more would take it way beyond where it was meant to be possibly needlessly, there are other ways to do it.
We should really let Core focus on other Core features unless someone can tell me what this feature should be doing that I'm missing? I don't think we should beat it to do what it wasn't supposed to do.
Pete *
I think this is spot on. Complex redirection does not have to be the responsibility of the CMS. Url change tracking on the other hand makes perfect sense.
Packages can solve any other needs, without interfering with the core CMS.
While the current implementation does exactly as Pete says above, it's got such a nice UI that admins are bound to ask "why can't I add here".
I really don't fancy repeating all of the above reasons to explain why this is a "CMS internal 301 feature only".
I also really don't like having to have someone who can navigate an XML file, not to mention figure out which form to use when adding redirects to urlrewriting.config.
I definitely don't enjoy spending my own time on it.
But I haven't had a chance to try out the 301 package, so will do that.
I appreciate that IIS will do this quicker than Umbraco. But I'm sure savvy and observant devs will start adding to the built-in 301 table. It's probably also just a question of time before someone actually creates a package to add stuff there.
Honestly, even with all the reasons provided, I can't see why the current core implementation shouldn't allow that. For most sites it won't be a performance issue. And when it becomes a performance issue, I'm sure there's more things helping to drag it down. If well documented, maybe even in the UI, everyone would know it has its limits, and option B, C and D is better when you reach a certain limit.
Did anyone profile the existing solution on a site with 10k+ articles where 40% of them were renamed during authoring or QA? Did it mess everything up and have to be replaced? Did it actually run swell? Are we making storms in teacups?
As to prevent too much bike shedding on this one. We essentially have a few options:
We implement static rewrites within Umbraco utilizing the current strategy that the 301 redirect logic uses and UI. This would mean:
It must be very clear that this engine will never support complex redirects or rewrites. It will purely do 1:1 redirects and that is all
There is then 2 options:
If people would like a UI for editing complex redirects/rewrites, etc... then it will be up to the community to build a package to deal with this either utilizing IIS Rewrites or 3rd party libraries
Do nothing in core and leave all of this up to package devs because the core cannot support all of the flexibility and requirements in one place for all redirect/rewrite stuff.
So it comes down to exactly what people want Umbraco core to do. If it's just 1:1 static redirects then sure, we can put that task as Up-for-grabs and integrate it into a 7.x minor release.
Sounds good to me.
1 would solve a couple of common use-cases for us:
a. Add top 50-100 organic search hits pointing to old urls when migrating
b. "Oh bugger, we sent the wrong URL to printing-company XYZ. Can we make that URL work for this landing page?" (yep, happens almost every time)
While it would be nice to be able to add 1:1 redirects in the back office, we typically implement a bunch 301s when we launch a redesigned site using URL Rewrite. The current functionality of 7.5 complements this as it handles changes after the redesign is launched. Once and a while we miss a few and need to add some and some projects have more complex rewrite rules (I recently made a custom module to handle a specific requirement). But for the most part, I think it works fine as is and I would vote to keep anything too complex out of core.
I also think it would be possible for someone to extend the IIS Rewrite module with something database driven and integrated with a backoffice UI. But that should be a separate project, IMHO.
Alex
I'd vote for for option 1.b above (allowing 1:1 static redirects, which would execute as the first content finder in the list). It seems a shame to have all the current 301 redirect management goodness in the core without adding this simple feature.
A custom solution should be employed for anything more complex; be it IIS, http module or package. NB I used the Simple 301 package in a recent site, which supports Regex-based pattern matching as well as redirect loop detection. It is well written and caches the redirects collection in memory.
is working on a reply...