I'm in a situation where I would like to strip certain HTML tags from the output of bodyText with XSLT. More precisely it's the header tags I dont want to show.
If you happen to be using uComponents, then you could try the XML XsltExtension method called Parse(). This will take the HTML from your 'bodyText' property and convert it to XML.
Lee, I think uComponents is the way to do it. I have installed the package, but I can't get it to work. I have registered the extension in xsltExtensions.config as:
The error I'm getting when trying to save my xslt file is:
System.Xml.Xsl.XslLoadException: 'ucomponents.xml.Parse()' is an unknown
XSLT function. An error occurred at C:\Users\Stefan\Documents\My Web
Sites\CD\xslt\634576611715742106_temp.xslt(15,1).
at System.Xml.Xsl.XslCompiledTransform.LoadInternal(Object stylesheet, XsltSettings settings, XmlResolver stylesheetResolver)
at umbraco.presentation.webservices.codeEditorSave.SaveXslt(String
fileName, String oldName, String fileContents, Boolean ignoreDebugging)
What am I missing?
Rodion, I didn't even think of using CSS for that - keeping that in mind will prove useful in other situations, but unfortuatenely it can't be done in this situation :(
When bodyText only contains a paragraph with text inside (lets say <p>This is a test</p>, everything works fine and <p>This is a test</p> shows up in the textarea.
When bodyText contains any other html inside the <p></p> tags, I get an error saying:
<Exception Type="System.Xml.XmlException"> <Message>There are multiple root elements. Line 4, position 2.</Message> <StackTrace> <Frame>System.Xml.XmlTextReaderImpl.Throw(Exception e)</Frame> <Frame>System.Xml.XmlTextReaderImpl.Throw(String res, String arg)</Frame><Frame>System.Xml.XmlTextReaderImpl.ParseDocumentContent()</Frame> <Frame>System.Xml.XmlTextReaderImpl.Read()</Frame> <Frame>System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space)</Frame> <Frame>System.Xml.XPath.XPathDocument..ctor(TextReader textReader)</Frame> <Frame>uComponents.Core.XsltExtensions.Xml.Parse(String xml)</Frame> </StackTrace> </Exception>
@Chriztian: With the next (major) version of uComponents (v4.x) I'm planning on using HtmlAgilityPack to parse the HTML - that should handle all the quirks much better! In the meantime, any bugs, etc ... CodePlex me! (oooo how rude! LOL)
The crucial line is the one I've highlighted, which tells the processor to basically use the entry template in the _WYSIWYG.xslt file (because it also has the mode="WYSIWYG" specified).
From there, you can add templates for specific things, e.g. you wanted to skip the <h2>'s - just add an empty template for them then:
<xsl:template match="h2" /><!-- Sorry, no rooom for you... -->
Unfortunately I'm still left in the dark with a few questions - which I hope you will answer.
1. What can I do to make the _WYSIWYG.xslt skip every html tag but the paragraphs? Now it will only skip <h2> and include everything else (images, lists etc.). 2. What about stripping paragraph classes? 3. How can I apply the template on bodyText in conjunction with umbraco.library:TruncateString? 4. Before using the xslt include, I tried using uComponents as suggested by Lee. Can the two solutions be used together (for example to take care of when using the parse() function from uComponents) to prevent this from failing?
Sorry for asking all these questions, but templates and xslt extensions is pretty new to me. Hopefully these questions will prove useful for others too!
1. One way to do this is to replace the Identity Template (match="* | text()") with a new template that basically just bypasses elements and text - then add another one for those elements you *do* want to copy:
3. That's rather tricky - the template with mode="WYSIWYG.excerpt" tries to do a similar thing, whereby only selecting the first paragraph - but it needs tweaking to your particular situation.
4. The _WYSIWYG.xslt already takes care of those two issues (multiple root elements and the thing) if you're executing like in the highlighted line in my previous answer.
I know this is an old topic and I apologize. But I am using this for our client's mobile site due to the design. Everything works except the instances where we have macros in the RTE. These are unavoidable due to the clients' design restrictions and desire for control.
Is there a way I can render the RTE content fully before parsing it in my macro?
If not, can I target it to be excluded as well.
Maybe I do not understand the protocol. But currently all pages but those work fine and they are throwing this error:
Unexpected end of file while parsing PI has occurred. Line 6, position 613. System.Xml.XmlTextReaderImpl.Throw(String res, String arg) System.Xml.XmlTextReaderImpl.ParsePIValue(Int32& outStartPos, Int32& outEndPos) System.Xml.XmlTextReaderImpl.ParsePI(StringBuilder piInDtdStringBuilder) System.Xml.XmlTextReaderImpl.ParseElementContent() System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space) System.Xml.XPath.XPathDocument..ctor(TextReader textReader) uComponents.XsltExtensions.Xml.ParseXml(String xml, String xpath)
I've had the same problem once in a while and I just dug out one of the "solutions" I've been using - basically, I sacrifice the WYSIWYG handling when there's a macro on the page, which of course is a call you can only make when you know your solution well.
Here goes:
<!-- Let's make a variable for this -->
<xsl:variable name="macroStart" select="'<?UMBRACO_MACRO '" />
<!-- Any macros on the page? -->
<xsl:if test="contains($currentPage/bodyText, $macroStart)">
<xsl:value-of select="umbraco.library:RenderMacroContent($currentPage/bodyText, $currentPage/@id)" disable-output-escaping="yes" />
</xsl:if>
<!-- Otherwise, handle WYSIWYG content... -->
<xsl:apply-templates select="$currentPage/bodyText[normalize-space()][not(contains(., $macroStart))]" mode="WYSIWYG" />
(Yes, I know about the <xsl:choose> construct β I just try not to use it for simple stuff like this A/B case :-)
Come to think of itβ it should actually be possible to have the _WYSIWYG.xslt handle this automatically, by detecting the macro(s) an then use RenderMacroContent() first β Hmmmm???!!... (evil laughing ensue :-)
Strip certain html tags
Hi.
I'm in a situation where I would like to strip certain HTML tags from the output of bodyText with XSLT. More precisely it's the header tags I dont want to show.
This is the compromise I could come up with.
An example could be something like this:
Which should be:
By the way, I did find this thread, but I wonder if it can be done in an easier way?
http://our.umbraco.org/forum/developers/xslt/10272-Remove-attributes-from-html-tags-in-xslt
Thanks in advance!
Hi Stefan,
If you happen to be using uComponents, then you could try the XML XsltExtension method called Parse(). This will take the HTML from your 'bodyText' property and convert it to XML.
Then you can use the variable to select the XML (HTML) nodes that you want...
Cheers, Lee.
Hmmm. My first initial idea:
:-)
Thank you for your replies!
Lee, I think uComponents is the way to do it. I have installed the package, but I can't get it to work.
I have registered the extension in xsltExtensions.config as:
and added the following prefix attributes to the xsl:stylesheet element:
The error I'm getting when trying to save my xslt file is:
What am I missing?
Rodion, I didn't even think of using CSS for that - keeping that in mind will prove useful in other situations, but unfortuatenely it can't be done in this situation :(
Hi Stefan,
Sorry, it was a typo in my example (I was coding by hand) ... it should be:
(I'd put a period "." instead of a colon ":" - doh!)
Cheers, Lee.
Well, that happens when you (=me) is copy-pasting without paying attention...!
I'm getting soem strange errors that I cant interpret.
I have put this textarea right after the beginning of a for-each loop for testing purposes:
When bodyText only contains a paragraph with text inside (lets say <p>This is a test</p>,
everything works fine and <p>This is a test</p> shows up in the textarea.
When bodyText contains any other html inside the <p></p> tags, I get an error saying:
Do you have any clues about what's causing that?
Thanks again!
I have just learned that it's because that bodyText contains more than one root element.
Can I overcome this in any way, and still strip all tags other than the paragraphs?
For example, this will fail on line 4, position 2:
Hi Stefan,
Ah yes, it must be valid XML, so would need a single root tag... try this:
It's a little bit hacky, but had to encode the angle-brackets :-$
Cheers, Lee.
Hi guys,
I'll just chip in with another gotcha you might run into (sorry Lee, I KNOW I should have submitted bugs long ago for these :-)
- bodyText may at some point contain the dreaded non-breaking space, and THAT will wreak havoc again...
I've wrapped up most of this into a nice little include file that I use - it's available as a Gist for now: https://gist.github.com/1171897
/Chriztian
@Chriztian: With the next (major) version of uComponents (v4.x) I'm planning on using HtmlAgilityPack to parse the HTML - that should handle all the quirks much better! In the meantime, any bugs, etc ... CodePlex me! (oooo how rude! LOL)
@Chriztian: Forgot to say - about your gist snippet ... the "EditorContent" entity is very very clever and cool!
Hi Lee,
Now look - I just went and reported TWO issues in the same day (even same hour :-). "How do you like them apples?"
Thanks!
/Chriztian
oooh I like apples!
Thanks again for your replies!
And Chriztian, you were right, the sure made havoc again!
I have included the xslt file, but because of my lack of experience with templates in xslt, I can't figure out how to include it :/
Hi Stefan,
OK - here's a complete sample that should get you going:
The crucial line is the one I've highlighted, which tells the processor to basically use the entry template in the _WYSIWYG.xslt file (because it also has the mode="WYSIWYG" specified).
From there, you can add templates for specific things, e.g. you wanted to skip the <h2>'s - just add an empty template for them then:
/Chriztian
Thank you yet again :-)
Unfortunately I'm still left in the dark with a few questions - which I hope you will answer.
1. What can I do to make the _WYSIWYG.xslt skip every html tag but the paragraphs? Now it will only skip <h2> and include everything else (images, lists etc.).
2. What about stripping paragraph classes?
3. How can I apply the template on bodyText in conjunction with umbraco.library:TruncateString?
4. Before using the xslt include, I tried using uComponents as suggested by Lee.
Can the two solutions be used together (for example to take care of when using the parse() function from uComponents) to prevent this from failing?
Sorry for asking all these questions, but templates and xslt extensions is pretty new to me. Hopefully these questions will prove useful for others too!
PS: Learning a lot of useful stuff right now :-)
Anyone?
Hi Stefan,
Thanks for the nudge :-)
Here goes:
1. One way to do this is to replace the Identity Template (match="* | text()") with a new template that basically just bypasses elements and text - then add another one for those elements you *do* want to copy:
2. Already solved with the above...
3. That's rather tricky - the template with mode="WYSIWYG.excerpt" tries to do a similar thing, whereby only selecting the first paragraph - but it needs tweaking to your particular situation.
4. The _WYSIWYG.xslt already takes care of those two issues (multiple root elements and the thing) if you're executing like in the highlighted line in my previous answer.
Let us now how it goes!
/Chriztian
I know this is an old topic and I apologize. But I am using this for our client's mobile site due to the design. Everything works except the instances where we have macros in the RTE. These are unavoidable due to the clients' design restrictions and desire for control.
Maybe I do not understand the protocol. But currently all pages but those work fine and they are throwing this error:
Hi Ashley,
I've had the same problem once in a while and I just dug out one of the "solutions" I've been using - basically, I sacrifice the WYSIWYG handling when there's a macro on the page, which of course is a call you can only make when you know your solution well.
Here goes:
(Yes, I know about the
<xsl:choose>
construct β I just try not to use it for simple stuff like this A/B case :-)Hope it helps,
/Chriztian
I was afraid of that but it makes sense. Thank you!
Come to think of itβ it should actually be possible to have the _WYSIWYG.xslt handle this automatically, by detecting the macro(s) an then use RenderMacroContent() first β Hmmmm???!!... (evil laughing ensue :-)
/Chriztian
is working on a reply...