Search code examples
c#unicode

Convert Unicode string into proper string


I have a string which contains unicode data.

I want to write it in a file . When the data is written in file it gives me simple unicode value instead of languages other than english.

string originalString = ((char)(buffer[index])).ToString();
//sb.Append(DecodeEncodedNonAsciiCharacters(originalString.ToString()));
foreach (char c1 in originalString)
{
    // test if char is ascii, otherwise convert to Unicode Code Point
    int cint = Convert.ToInt32(c1);
    if (cint <= 127 && cint >= 0)
        asAscii.Append(c1.ToString());
    else
    {
        //String s = Char.ConvertFromUtf32(cint);
        asAscii.Append(String.Format("\\u{0:x4} ", cint).Trim());
       // asAscii.Append(s);
    }
}

sb.Append((asAscii));
Console.WriteLine();

when i see the output file the data shows like this

1 00:00:27,709-->00:00:32,959 1.2 \u00e0\u00a4\u0085\u00e0\u00a4\u00b0\u00e0\u00a4\u00ac \u00e0\u00a4\u00b2\u00e0\u00a5\u008b\u00e0\u00a4\u0097 28 \u00e0\u00a4\u00b0\u00e0\u00a4\u00be\u00e0\u00a4\u009c\u00e0\u00a5\u008d\u00e0\u00a4\u00af \u00e0\u00a4\u0094\u00e0\u00a4\u00b0 \u00e0\u00a4\u00b8\u00e0\u00a4\u00be\u00e0\u00a4\u00a4 \u00e0\u00a4\u0095\u00e0\u00a5\u0087\u00e0\u00a4\u0082\u00e0\u00a4\u00a6\u00e0\u00a5\u008d\u00e0\u00a4\u00b0 \u00e0\u00a4\u00b6\u00e0\u00a4\u00be\u00e0\u00a4\u00b8\u00e0\u00a4\u00bf\u00e0\u00a4\u00a4 \u00e0\u00a4\u00aa\u00e0\u00a5\u008d\u00e0\u00a4\u00b0\u00e0\u00a4\u00a6\u00e0\u00a5\u0087\u00e0\u00a4\u00b6

but it should look like this

1 00:00:27,400 --> 00:00:32,760 1.2 अरब लोग 28 राज्य और सात केंद्र शासित प्रदेश

I have tried many things but none has done my job.


Solution

  • string unicodeString = "This string contains the unicode character Pi(\u03a0)";
    
         // Create two different encodings.
         Encoding ascii = Encoding.ASCII;
         Encoding unicode = Encoding.Unicode;
    
         // Convert the string into a byte[].
         byte[] unicodeBytes = unicode.GetBytes(unicodeString);
    
         // Perform the conversion from one encoding to the other.
         byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
    
         // Convert the new byte[] into a char[] and then into a string.
         // This is a slightly different approach to converting to illustrate
         // the use of GetCharCount/GetChars.
         char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
         ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
         string asciiString = new string(asciiChars);
    
         // Display the strings created before and after the conversion.
         Console.WriteLine("Original string: {0}", unicodeString);
         Console.WriteLine("Ascii converted string: {0}", asciiString);