removing span and p tags from shortened rte content

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

ThomasBrunbjerg 90 posts 182 karma points

Aug 29, 2017 @ 11:20
0

Removing <span> and <p> tags from shortened RTE content
I need to display the first 200 or so characters from some content entered in an RTE. Right now I have accomplished this using a function that truncates whatever string you input to a desired length. Right now it removes the
tag, that appears at the start, by letting the substring start at index 3, but sometimes other tags are added as well.
```
public static string TruncateAtWord(string input, int length)
    {
        if (input == null || input.Length < length)
            return input;
        int iNextSpace = input.LastIndexOf(" ", length, StringComparison.Ordinal);
        return string.Format("{0}...", input.Substring(3, (iNextSpace > 0) ? iNextSpace : length).Trim());
    }
```
This works fine, though it doesn't take into account the added tags the RTE creates, which are returned when I use the ToString() method.

The function itself is called like this :
```
@TruncateAtWord(item.GetPropertyValue("sommerhusFuldBeskrivelse").ToString(), 200)
```
How can I take into account the extra tags that the RTE creates behind the scenes when i shorten my content? Is there a way for the RTE to not use these tags, since I already have the text wrapped in a p tag?
Copy Link
Laurence Gillian 600 posts 1219 karma points

Aug 30, 2017 @ 14:12

0

Html Agility pack can be used for this purpose, see: https://stackoverflow.com/questions/12787449/html-agility-pack-removing-unwanted-tags-without-removing-content

However, it may be easier / less risk of something breaking in the middle of the night to have an additional field for this excerpt that uses the textarea property editor, rather than the rich text editor.

Copy Link
Paul Griffiths 370 posts 1021 karma points

Aug 31, 2017 @ 18:44
0
Hi Thomas,

Whenever i need to strip out the HTML from the RTE output i tend to use the following helper method passing in the alias of the RTE property from the doc type.
```
   library.StripHtml()
```
If i want to truncate the content to a certain number of characters i use something like so
```
@{
      var contentSnippet = Umbraco.Truncate(library.StripHtml(Model.Content.GetPropertyValue<string>("mainContent")), 120, true);
   }
```
and then output the truncated snippet without html
```
<p>@contentSnippet </p>
```
Hopefully that is what you were trying to achieve?

Thanks

Paul
Copy Link

David Armitage 510 posts 2082 karma points

Sep 01, 2017 @ 04:17

Hi Guys,

Here are a few helper methods. I usually add these in as String Extension Methods.

I haven't used them in a long time so please give them a good test.

public static string StripHTML(string htmlString)
        {
            string pattern = @"<(.|\n)*?>";

            return Regex.Replace(htmlString, pattern, string.Empty);
        }

        public static string HtmlGetFirstParagraph(string htmlString)
        {
            Match m = Regex.Match(htmlString, @"<p>\s*(.+?)\s*</p>");
            if (m.Success)
            {
                return m.Groups[1].Value;
            }
            else
            {
                return string.Empty;
            }
        }

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Removing <span> and <p> tags from shortened RTE content