Search code examples
c#antixsslibrary

Is Sanitizer.GetSafeHtmlFragment supposed to remove <br> elements?


MS's AntiXSS (v4.2.1) Sanitizer.GetSafeHtmlFragment(string) method is removing <br> and <br /> tags from my input. Is this supposed to happen? Is there a way around it?

It seems to be removing \n and \r characters too, so I cannot call Replace() after the sanitizer has done its job.


Solution

  • The 4.2.x release was motivated by a security vulnerability detected precisely in the HTML sanitizer. More information about this fact:

    However, it seems that besides fixing the vulnerability the sanitizer was changed to be much more aggressive to the point of being almost unusable. There is a reported issue about this fact in WPL CodePlex site (GetSafeHtmlFragment replacing all html tags).

    If your problem is only with <br> tag and you want to stick with AntiXSS sanitizer then you can implement an ugly workaround resorting to pre-processing your input an then post-process the result of the sanitizer.

    Something like this (code for illustrative purposes only):

    static void Main(string[] args)
    {
        string input = "<br>Hello<br/>World!";
    
        input = EscapeHtmlBr(input);
        var result = Sanitizer.GetSafeHtmlFragment(input);
        result = UnescapeHtmlBr(result);
    
        Console.WriteLine(result);
    }
    
    const string BrMarker = @"|br|";
    
    private static string UnescapeHtmlBr(string result)
    {
        result = result.Replace(BrMarker, "<br />");
    
        return result;
    }
    
    private static string EscapeHtmlBr(string input)
    {
        input = input.Replace("<br>", BrMarker);
        input = input.Replace("<br />", BrMarker);
        input = input.Replace("<br/>", BrMarker);
    
        return input;
    }