Search code examples
c#htmlhtmldecode

Decode HTML 5 Character set


I am unable to decode the following HTMl 5 code 10:00 AM in my c# code, after using HttpUtility.HtmlDecode("10:00 AM"); i get the same Output instead of seried output "10:00 AM".

However when i use other HTML character sets like & or > then HttpUtility.HtmlDecode gives the desired output, is there a way to decode HTML5 character sets in c#

I have also tried with System.Net.WebUtility.HtmlDecode, System.Uri.UnescapeDataString yet the same output


Solution

  • As commented by Svein this is an issue with the .NET Framework not supporting HTML5 entities.

    Since the .NET Framework has gone open source, you can check the code and change it to reflect the necessary changes, as someone did already. If you check out that pull request, you see the problem: there is a breaking change between HTML4 entities and HTML5 entities, which they didn't agree on how to fix. That simply means that the .NET Framework will not support HTML5 entities until a design decision is made.

    For you, in the meantime, you could take the diff of the commit, and create your own HTML5 entity parser (which is simply a string replacement and some dictionary lookup).