Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • David W. 159 posts 284 karma points c-trib
    Jul 01, 2010 @ 10:26
    David W.
    0

    Remove attributes from html-tags in xslt

    Hello,

    I'm trying to strip all attributes (specifically 'style') from html-tags in my xslt for my RSS-feed. I want to keep all html-tags (<p>, <strong> etc), so umbraco.library.StripHtml wont do it.

    ie, I want "<p style='margin:10px'>some text</p>" to become "<p>some text</p>". How can I achieve this?

    Thanks.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 01, 2010 @ 10:43
    Ismail Mayat
    0

    Sledger,

    Just found this on google not tested but you need something like

     <xsl:template match="p"> 
       <p>
        <xsl:for-each select="@*"> 
        </xsl:for-each> 
    <xsl:value-of select="./text()"/>  
     </p>
      </xsl:template> 

    that will loop through all attributes and we dont write out anything in the for-each hence they will get ignored.

    Regards

    Ismail

  • David W. 159 posts 284 karma points c-trib
    Jul 01, 2010 @ 10:57
    David W.
    0

    Thanks for the reply. But the string I want to format is from the bodyText-field, like this:

    ...
    <content:encoded>
              <xsl:value-of select="concat('&lt;![CDATA[ ', ./data [@alias='bodyText'],']]&gt;')" disable-output-escaping="yes"/>
    </content:encoded>
    ...

    Is it possible to apply your method to this as well?

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 01, 2010 @ 11:15
    Ismail Mayat
    0

    Sledger,

    Not sure if this will work but could you do something like

    <xsl:variable name="tmpBodyText">
      <xsl:copy-of select="./data [@alias='bodyText']"/>
    </xsl:variable>
    <xsl:apply-templates select="msxml:node-set($tmpBodyText)//p"/>

     

    then do what you need to do, again not tested just an idea.

    Regards

    Ismail

  • David W. 159 posts 284 karma points c-trib
    Jul 01, 2010 @ 11:24
    David W.
    0

    Hm, I will give that a try, but I need it toremove attributes from all html-tags and not just <p>-tags.

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Jul 01, 2010 @ 11:44
    Lee Kelleher
    1

    Hi Sledger,

    Not sure that you're going to be able to do this purely with XSLT.  (It might be possible, but reckon you'll burn hours trying to achieve it!)

    My suggestion is to write an XSLT extension to perform a RegEx against the bodyText, removing specific attributes.

    i.e.

    public static string CleanHtml(string html)
    { 
        // start by completely removing all unwanted tags 
        html = Regex.Replace(html, @"<[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>", "", RegexOptions.IgnoreCase); 
        // then run another pass over the html (twice), removing unwanted attributes 
        html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase); 
        html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase); 
        return html;
    }

    Reference to source: http://tim.mackey.ie/CleanWordHTMLUsingRegularExpressions.aspx

    Good luck, Lee.

  • Rich Green 2246 posts 4008 karma points
    Jul 01, 2010 @ 11:52
    Rich Green
    0

    I've never used it but isn't http://htmlagilitypack.codeplex.com/ ideal for this type of thing?

    Rich

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Jul 01, 2010 @ 11:57
    Lee Kelleher
    0

    Yes, HTML Agility Pack is excellent for navigating/traversing/manipulating (and more) with HTML objects (DOM).  You could remove all attributes with it - but it's an extra dependency, when a quick-n-dirty RegEx can (could) take care of it. (Since RegEx is already in the .NET framework).

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 01, 2010 @ 12:04
    Ismail Mayat
    0

    Lee,

    Would my suggestion not work? Or is it that bodyText unless in cdata with have entities etc that will cause it to go boom?   Ps that idea with @* remembered it from a tridion project where we had to clean out some word crap.

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jul 01, 2010 @ 12:18
    Ismail Mayat
    0

    Guys,

    Thinking about this some more, are we not over complicating things, we could just update tinymce config file so that for p elements only allowed attribute is class. true you would have issues with updates becuase you would end up overwriting tinymce config but in theory that should sort it?

    Regards

    Ismail

  • David W. 159 posts 284 karma points c-trib
    Jul 01, 2010 @ 12:26
    David W.
    0

    The xslt extension thing seems to me to be the best solution (really need to start a umbraco video subscription so I can se the end of Nielses video;).

    Isamail: Thanks for your help but modifying the RTE is not a solution to me because I need the style-tags for the web presentation, the stripped version is only for the RSS.

    Thanks to all.

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Jul 01, 2010 @ 12:32
    Lee Kelleher
    0

    @Ismail:  I've had a quick test of trying to parse the 'bodyText' string as XML, but kept hitting various entity-encoding errors.  I'm sure there is a way to do it, but I keep getting cross-eyed with the entities.  I recall an old forum post about trying to achieve the same thing, and whoever it was ended up using an XSLT extension to convert the content/string to an XPathNodeIterator.  (If I find the topic, I'll post here).

    Cheers, Lee.

Please Sign in or register to post replies

Write your reply to:

Draft