Search code examples
c#razorepiserver

Remove html tags from MainBody


Have an issue here where I try to remove all html tags from this line of EPiServer code

 @(Html.PropertyFor(m => m.MainBody)

Because this is suppose to be inside a <a>example code here</a>

Whats a good way to solve this when running EPi Server?


Solution

  • First, it is bad practice using XhtmlString this way, that being said we don't always get to choose.

    I'm using this which is a modified version of Rob Volk's extension method.

    using System.Collections.Generic;
    using System.Text;
    using System.Text.RegularExpressions;
    
    public static class HtmlStringExtensions
    {
        /// <summary>
        /// Truncates a string containing HTML to a number of text characters, keeping whole words.
        /// The result contains HTML and any tags left open are closed.
        /// by Rob Volk with modifications
        /// http://robvolk.com/truncate-html-string-c-extension-method/
        /// </summary>
        /// <param name="html"></param>
        /// <param name="maxCharacters"></param>
        /// <param name="trailingText"></param>
        /// <returns></returns>
        public static string TruncateHtmlString(this string html, int maxCharacters, string trailingText)
        {
            if (string.IsNullOrEmpty(html))
                return html;
    
            // find the spot to truncate
            // count the text characters and ignore tags
            var textCount = 0;
            var charCount = 0;
            var ignore = false;
            var newString = string.Empty;
            foreach (char c in html)
            {
                newString += c;
    
                charCount++;
                if (c == '<')
                {
                    ignore = true;
                }
                else if (!ignore)
                {
                    textCount++;
                }
    
                if (c == '>')
                {
                    ignore = false;
                }
    
                // stop once we hit the limit
                if (textCount >= maxCharacters)
                {
                    break;
                }
            }
    
            // Truncate the html and keep whole words only
            var trunc = new StringBuilder(newString);
            //var trunc = new StringBuilder(html.TruncateWords(charCount));
    
            // keep track of open tags and close any tags left open
            var tags = new Stack<string>();
            var matches = Regex.Matches(trunc.ToString(), // trunc.ToString()
                @"<((?<tag>[^\s/>]+)|/(?<closeTag>[^\s>]+)).*?(?<selfClose>/)?\s*>",
                RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
    
            foreach (Match match in matches)
            {
                if (match.Success)
                {
                    var tag = match.Groups["tag"].Value;
                    var closeTag = match.Groups["closeTag"].Value;
    
                    // push to stack if open tag and ignore it if it is self-closing, i.e. <br />
                    if (!string.IsNullOrEmpty(tag) && string.IsNullOrEmpty(match.Groups["selfClose"].Value))
                        tags.Push(tag);
    
                    // pop from stack if close tag
                    else if (!string.IsNullOrEmpty(closeTag))
                    {
                        // pop the tag to close it.. find the matching opening tag
                        // ignore any unclosed tags
                        while (tags.Pop() != closeTag && tags.Count > 0)
                        { }
                    }
                }
            }
    
            if (html.Length > charCount)
                // add the trailing text
                trunc.Append(trailingText);
    
            // pop the rest off the stack to close remainder of tags
            while (tags.Count > 0)
            {
                trunc.Append("</");
                trunc.Append(tags.Pop());
                trunc.Append('>');
            }
    
            return trunc.ToString();
        }
    
        /// <summary>
        /// Truncates a string containing HTML to a number of text characters, keeping whole words.
        /// The result contains HTML and any tags left open are closed.
        /// </summary>
        /// <param name="html"></param>
        /// <param name="maxCharacters"></param>
        /// <returns></returns>
        public static string TruncateHtmlString(this string html, int maxCharacters)
        {
            return html.TruncateHtmlString(maxCharacters, null);
        }
    
        /// <summary>
        /// Strips all HTML tags from a string
        /// </summary>
        /// <param name="s"></param>
        /// <returns></returns>
        public static string StripHtml(this string html)
        {
            if (string.IsNullOrEmpty(html))
                return html;
    
            return Regex.Replace(html, @"<(.|\n)*?>", string.Empty);
        }
    
    }
    

    Implement using the ToHtmlString() from EPiServer.Core

    In example

    // @using EPiServer.Core
    @(Html.PropertyFor(m => m.MainBody.ToHtmlString().TruncateHtmlString(160, "..."))