Search code examples
c#unicodeutf-8html-encode

Does C# have something like PHP's mb_convert_encoding()?


Is there a way on C# that I can convert unicode strings into ASCII + html entities, and then back again? See, in PHP, I can do it like so:

<?php
// RUN ME AT COMMAND LINE
$sUnicode = '<b>Jöhan Strauß</b>';
echo "UNICODE: $sUnicode\n";
$sASCII = mb_convert_encoding($sUnicode, 'HTML-ENTITIES','UTF-8');
echo "ASCII: $sASCII\n";
$sUnicode = mb_convert_encoding($sASCII, 'UTF-8', 'HTML-ENTITIES');
echo "UNICODE (TRANSLATED BACK): $sUnicode\n";

Background:

  • I need this to work in C# .NET 2 because we are constrained and can't use a higher .NET library in an older application.
  • I handle the PHP backend on this application. I wanted to share some tips with the C# frontend team on this project.

Solution

  • Yes, there's Encoding.Convert, although I rarely use it myself:

    string text = "<b>Jöhan Strauß</b>";
    byte[] ascii = Encoding.ASCII.GetBytes(text);
    byte[] utf8 = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, ascii);
    

    I rarely find I want to convert from one encoded form to another - it's much more common to perform a one way conversion from text to binary (Encoding.GetBytes) or vice versa (Encoding.GetString).