Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 00:57
    Alexander Gräf
    0

    Razor with unnecessary HTML entities

    Razor outputs all text with HTML encoding, unless Html.Raw is used. For example:

    <h1>@Model.PageTitle</h1>
    

    Will automatically HTML encode the contents of PageTitle. This is fine, however Razor also will encode a number of special characters as HTML entities, like German Umlaute, as HTML numeric entities, which is unnecessary, as these can be represented perfectly fine in the UTF-8. The only characters that really need HTML entities are usually < > and &

    Is there a way to configure this behavior?

  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 08:50
    Huw Reddick
    0

    This is, as they say, by design. When Razor renders strings, it automatically HTML encodes them, it is a security measure.

    I would advise doing this rather than using Html.Raw

    @(new HtmlString(stringWithMarkup))

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 09:43
    Alexander Gräf
    0

    I don't think it is "by design", and it is also not a safety feature, as any and all characters outside the ASCII range get encoded to a numeric entity without a check whether the page encoding could carry the character without being encoded in an entity.

    Imagine you're doing Chinese texts, this would waste a good amount of space and bandwidth just to encode every single character into an entity, while browsers could easily read it as UTF-8 or UTF-16.

    So, is there a way to influence the behavior, or do I actually have to provide my own encoding, and thus completely defeat the purpose of this feature?

  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 10:54
    Huw Reddick
    0

    it is by design, razor encodes the strings automatically, speak to Microsoft if you disagree. However razor should not be break umlaute or other special languages, I have a site in mvc razor that quite happily displays arabic.

    Just a guess, but maybe the issue is that Umbraco should not be encoding pagetitle as you are basically encoding the output twice

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 12:35
    Alexander Gräf
    0

    It's only a single conversion, & is displayed fine and with a single entity encoding. It's just encoding like the output was only ASCII and actually needed characters outside the ASCII codepage to be encoded.

    Razor generally encoding strings is fine, it's just the problem that it encodes characters that wouldn't really need any encoding at all.

  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 13:05
    Huw Reddick
    0

    I can't say I have had issue before, do you have an example of some characters that are rendered incorrectly, I will do some tests in my other app to see if it behaves the same or not

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 13:13
    Alexander Gräf
    0

    Well, all the German äöü ÄÖÜ ß characters, but also stuff like ®. The only characters that actually require encoding would be <, > and &.

  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 13:35
    Huw Reddick
    0

    I've never had issues with German characters, but yes you could have issues with copyright, trademark etc. because razor will encode the & which is part of the &copy; if they are entered that way.

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 13:42
    Alexander Gräf
    0

    Why would & be part of the trademark symbol, and if you were to enter &copy;, then that would put the actual text &copy; into the output, as the & is getting encoded as &amp;. I.e. like this:

    <title>Homepage &amp;copy;</title>
    

    Not sure you understand how encoding works. Anyway, still looking for a way to configure Razor encoding specifics.

  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 14:08
    Huw Reddick
    1

    I understand perfectly how encoding works, no need to be insulting.

    I said if you entered &trade; in a string razor would encode the & and display &trade; rather than ™ ,

    razor should not be encoding these äöü ÄÖÜ ß, are you sure it is razor doing the encoding? What does the raw pageTitle string look like in the database?

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 14:13
    Alexander Gräf
    0

    I created a plain Razor app in VS to test it, made a string property in the codebehind that returns special characters, and put that into the page with @Model.Test, and yes, it also gets encoded to HTML entities, although not decimal numeric ones, instead they are hexadecimal numeric ones, probably because the app is .NET Core and not .NET Framework.

    I'm trying to investigate. The compiled version of the page calls RazorPageBase.Write to output the string, and that uses an instance of System.Text.Encodings.Web.HtmlEncoder to encode the string. Will investigate further on the call chain.

    Code behind:

        public string Test
        {
            get {  return "Dies ist ein Test ÄÖÜ äöü ß"; }
        }
    

    Output:

        <h1>Dies ist ein Test &#xC4;&#xD6;&#xDC; &#xE4;&#xF6;&#xFC; &#xDF;</h1>
    
  • Huw Reddick 1929 posts 6697 karma points MVP 2x c-trib
    Jan 21, 2021 @ 14:24
    Huw Reddick
    0

    weird, I just tried the same as you and get a completely different result, no encoding.

    in your razor did you just do @model.Test if you do @Html.Raw(model.Test) what does that output?

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 14:30
    Alexander Gräf
    1

    So, I dug a bit deeper, and the API, at least in .NET Core is:

    services.Configure<WebEncoderOptions>(options => {
        options.TextEncoderSettings = new TextEncoderSettings(UnicodeRanges.All);
    });
    

    That removes the entity encoding for special characters, while <, > and & still get encoded.

    Now only to find out how to inject into Umbraco without recompiling. I miss the old days where configuration was simply done through an XML file.

    Here is a relevant ticket on GitHub: https://github.com/aspnet/HttpAbstractions/issues/315

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 15:27
    Alexander Gräf
    1

    Dug deeper:

    For .NET Framework, it's quite different. The Razor page calls System.Web.WebPages.WebPageBase.Write, that calls System.Web.WebPages.WebPageExecutingBase.WriteTo, that calls System.Web.HttpUtility.HtmlEncode to encode the string, that calls HttpEncoder.Current.HtmlEncode.

    The HttpEncoder.Current is initialized by <httpRuntime encoderType="" /> in web.config, and if not specified, it uses System.Web.Security.AntiXss.AntiXssEncoder with newer version of .NET Framework, which encodes a lot more characters than necessary.

    But even the default HttpEncoder encodes non-ASCII characters as entities, by calling System.Net.WebUtility.HtmlEncode:

    ASCII characters from 160 to 255 &#NNN;, where NNN is the three-digit decimal character code

    https://docs.microsoft.com/en-us/dotnet/api/system.web.util.httpencoder.htmlencode?view=netframework-4.8

    So this is all pretty ugly, basically you have to provide your own HttpEncoder implementation that overrides HtmlEncode and HtmlAttributeEncode. It's possible to put it into App_Code and reference it from the web.config without a strong assembly name. I won't post code here as it's probably weakening security to do the encoding yourself instead of having AntiXssEncoder do it, and the whole API is going to change anyway when Umbraco moves to .NET Core some time in the future.

  • Søren Gregersen 441 posts 1884 karma points MVP 2x c-trib
    Jan 21, 2021 @ 16:22
    Søren Gregersen
    0

    Hi Alexander Gräf,

    I don't think there is much we can do about this issue in the umbraco forum. Also, I guess encoding the characters in the way that you mention, would be from the xhtml days, when html had to be xml compliant :)

    2021 looks to be the year Umbraco moves to .Net Core - https://umbraco.com/blog/status-of-migration-to-net-core-december-2020/. This would open up for using some of the first tweaks you mention.

    I think that we will also soon see a version on .Net 5, depending on how much needs to be changed.

    HTH :)

  • Alexander Gräf 25 posts 131 karma points
    Jan 21, 2021 @ 16:29
    Alexander Gräf
    1

    I don't see how that has anything to with XML, as it has the same encoding rules. <, > and & have to be encoded, everything else depends on the charset of the file, which is usually UTF-8, which can accommodate most characters without encoding

    As by my last comment, one can override the HttpEncoder, which I am already doing, and thus fix the problem.

    When Umbraco moves to .NET Core, the IServiceCollection API is used instead, which means one will not be able to change encoding settings without recompiling Umbraco. Umbraco might choose to allow configuring the encoding via a different route, but I doubt it's a big priority right now. I mean it's really only going to be a problem with Cyrillic, Arabic and Asian pages, where it causes a lot of bloat, and that seems not to be an important market for Umbraco.

  • Alexander Gräf 25 posts 131 karma points
    Jan 22, 2021 @ 00:36
    Alexander Gräf
    100

    So, I dug even deeper. Turns out that the .NET Framework even has a compiler marco called ENTITY_ENCODE_HIGH_ASCII_CHARS that influences the encoding.

    Anyway, it turns out, despite what the documentation says, AntiXssEncoder is NOT the default encoder, even with newer versions of .NET Framework, and it actually mitigates the situation by not encoding high ASCII characters to numeric entities.

    <httpRuntime encoderType="System.Web.Security.AntiXss.AntiXssEncoder" />
    

    This change will fix the issue, without having to provide a custom HttpEncoder implementation, although I am not yet sure if AntiXssEncoder will interfere with the Umbraco backend in some way.

Please Sign in or register to post replies

Write your reply to:

Draft