Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Chuck Kirklen 36 posts 184 karma points
    Nov 01, 2016 @ 23:40
    Chuck Kirklen
    0

    Error attempting to render ≥ and other special chars in PDF

    When attempting to render less-than or less-than-or-equal signs in PDF, we're getting the following error:

    An exception occured parsing your content as XML: Name cannot begin with the '0' character, hexadecimal value 0x30. Line 2, position 45.
    

    Those chars render correctly in the Umbraco view, but when PDFCreator outputs XSL-FO, that's when we're seeing the error.

              <fo:block font-size="9pt"> 
                 @ParseRichText(FoHelper.Instance.GetRichTextNodes(@Model.assayRange))
              </fo:block>
    

    The assayRange property is an RTE and contains HTML.

    '<" char renders fine in Umbraco view

    error msg displayed in the PDF

    the XSL-FO generated

    plain text showing error msg

    What do I need to do to get special chars to render properly in the PDF output?

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Nov 02, 2016 @ 08:10
    Darren Ferguson
    0

    What is the raw HTML of the rich text area that is causing the error?

    The exception appears to be thrown by the helper that tries to read it as XML and it appears that you have an invalid tag that begins <0

    If you post the HTML of the RTE, it'll probably be quite easy to spot.

    Thanks.

  • Chuck Kirklen 36 posts 184 karma points
    Nov 02, 2016 @ 14:42
    Chuck Kirklen
    0

    I'm sure it's something simple but it's escaping me and probably would stick out like a sore thumb to trained eyes ;-)

    Here's the RTE HTML, dumbed-down to the simplest case:

    <p>&lt;0.5</p>
    

    enter image description here I thought maybe if I replaced "@node.Value" with "@Html.Raw(node.Value)" in the ParseRichText() helper, that might do the trick, but it still throws the error. Also tried @node.Value.Replace("&","&amp;") and variations.

    @helper ParseRichText(XmlNodeList nodes) {
    foreach(XmlNode node in nodes) {
    
       switch(node.NodeType)
       {
           case XmlNodeType.Text:
             @node.Value
             @ParseRichText(node.ChildNodes);
             break;
           case XmlNodeType.Element:
             @ParseElement(node);
             break;
           default:
             @ParseRichText(node.ChildNodes);
             break;
       }
     }
    }
    

    We have lots of data with these < ≤ ≥ and ° chars in them, and as soon as we hit one of those, it kind of runs off into the ditch when rendering in PDF.

    I appreciate your help!

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Nov 03, 2016 @ 08:04
    Darren Ferguson
    0

    Out of interest does it work if you remove the < from the RTE. Though it is odd as the RTE has already escaped it to < which should be fine.

    Also what happens if you remove @node.Value completely from the ParseRichText helper?

    Also try temporarily write node.Value wrapped by System.Web.HttpUtility.HtmlEncode()

    I won't have a moment until next week - but I could give it a go myself...

  • Chuck Kirklen 36 posts 184 karma points
    Nov 03, 2016 @ 17:13
    Chuck Kirklen
    0

    Tried all of the suggestions above, as well as wrapping @node.Value in the helper with @Html.Raw(), but none worked. Also tried tacking .Replace() on the end of node.Value to convert < to < and &. Also tried configuring tinyMceConfig.config to use named instead of raw.

    If i remove the < from the RTE, i don't get an error, but ≥ renders as Pounds Sterling char, ≤ renders as superscript 3, and > renders properly as >. Strange. Is the template not using UTF-8, maybe?

    I've made a super-simple version of the template below. If you create a node (this is Umbraco 7.5.3) with a property named "assayRange" of type RTE and enter the same data shown in the images below, you should see what i'm seeing.

    I modified the PDFRazorHelloWorld template to make this as simple as possible (see below).

    RTE contents RTE source

    Code wouldn't paste properly here in a code block or a quote, so i screen grabbed it for you to review. I'd be more than happy to email it. Top part of template Bottom part of template

  • Chuck Kirklen 36 posts 184 karma points
    Nov 07, 2016 @ 22:48
    Chuck Kirklen
    0

    Darren, Also, for completeness, i tried simply printing out the contents of the RTE on line 30 above - it didn't render valid XSL-FO, of course, but in the raw output I could see what it was outputting.

    Commending out line 11 and changing line 30 to eliminate the helper:

    <fo:inline font-weight="bold">Hello world! @Model.assayRange</fo:inline>
    

    resulted in this raw output:

    <fo:page-sequence master-reference="A4">
    <fo:flow flow-name="xsl-region-body">
    <fo:block>
    <fo:inline font-weight="bold">Hello world! 
        <p>≤1.5</p>
        <p>≥3.5</p>
        <p>>4.5</p>
        <p>£1.0</p>
    </fo:inline>
    </fo:block>
    </fo:flow>
    </fo:page-sequence>
    

    Does that make sense? Is there a directive at the top of the template that I need to add that isn't in the template above?

    Really a head-scratcher.

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Nov 09, 2016 @ 10:06
    Darren Ferguson
    0

    I'm confused - a < and > render fine on my machine but a ≥ causes an error - could you send me your template by email please? to df at darren-ferguson.com

  • Chuck Kirklen 36 posts 184 karma points
    Nov 09, 2016 @ 15:58
    Chuck Kirklen
    0

    Email sent - the template is one of the installed ones we've edited to isolate the issue, so has all the defaut parts; i've also included a screen grab of the actual test data i'm using (entered into the RTE), in both HTML and raw forms, just so we're testing the same data.

  • Chuck Kirklen 36 posts 184 karma points
    Nov 15, 2016 @ 16:01
    Chuck Kirklen
    0

    edited 2016.16.11

    Darren,

    Almost there! We know that TinyMce is returning &lt; for < and &gt; for > when retrieved in the PDF template, as noted above. So if we change these to encoded equivalents before passing the string to GetRichTextNodes(), then those chars are correctly interpreted:

                <fo:inline font-weight="bold">Hello world! @ParseRichText(FoHelper.Instance.GetRichTextNodes(Model.assayRange.ToString().Replace("&lt;","&amp;lt;").Replace("&gt;","&amp;gt;")))</fo:inline>
    

    Then down in ParseRichText(), if you wrap the @node.Value so that it is @Html.Raw(node.Value), then it renders properly in PDF as the less-than and greater-than symbols.

    But the other math characters, ≤ and ≥, are returned in the PDF template (@Model.assayRange in the example above) as Pounds Sterling (£) and superscript 3 (³).

    I'd love to add Replace("£","&amp;#8804;") to the line above but i can't seem to get a match on the £ symbol so that it will do the replace.

    Weirder, if i were to try to do this Replace("£","&amp;le;"), i get an error that states [Reference to undeclared entity 'le'.].

    Any idea how to snag those and convert them like we did above for less-than and greater-than before passing to GetRichTextNodes()?

    Thanks.

  • Chuck Kirklen 36 posts 184 karma points
    Nov 22, 2016 @ 15:49
    Chuck Kirklen
    0

    Darren, Update with some progress on circumventing the errors when trying to convert <, >, ≤, and ≥

    I can encode the < and > before passing to FoHelper.Instance.GetRichTextNodes() and they survive and are printed out by wrapping the @node.Value with @Html.Raw() in the ParseRichText() helper. And I’m fine with that – makes sense that we have to sneak those past GetRichTextNodes(), which is looking for start and end tags.

    But what I can’t figure out is why ≤ (≤) shows up properly in the RTE but when we ask for it in the PDF template, it appears as £.

    Similarly, ≥ (≥) shows up properly in the RTE but in the PDF template, it shows up as ³.

    They all show up as themselves in the regular Umbraco HTML rendering of the page, it’s only inside the template when we pull the same assayRange RTE field, we get these translations.

    If I could encode those to their real values (&#8804; and &#8805), I know they’ll get through – but I can’t for the life of me figure out what those are so I could do a Replace() on them.

    Please take the revised template inside the Zip and run it against the simple example and see if I need to include something else in order to get those to show up.

    I do know that if I try to convert to “&le;” it throws an error, so that’s why it feels like a code page issue in the template.

    Thanks!

  • Chuck Kirklen 36 posts 184 karma points
    Nov 28, 2016 @ 20:50
    Chuck Kirklen
    101

    Wow. Solved. Our customer actually came across an article that stated that XSL-FO pulls in its own default charset, which doesn't necessarily cover those symbols (less-than-or-equal-to, greater-than-or-equal-to).

    So he just added font-family: sans-serif to the <fo:block> tag and voila! It works!

    So the change was from:

    <fo:block font-size="9pt" color="#000">@ParseRichText(FoHelper.Instance.GetRichTextNodes(EncodeSpecialChars(@specimen.assayRange.ToString())))</fo:block>

    to:

    <fo:block font-family="sans-serif" font-size="9pt" color="#000">@ParseRichText(FoHelper.Instance.GetRichTextNodes(EncodeSpecialChars(@specimen.assayRange.ToString())))</fo:block>

    ==> So perhaps the issue isn't a bug in PDFCreator, per se, it's in the limited charset that XSL-FO uses? By specifying a font with support for those characters like we did here, it seems to work.

    Thanks to Darren and his crew for chasing this one around. It was elusive.

Please Sign in or register to post replies

Write your reply to:

Draft