Search code examples
r.net

Unicode characters returning from R.NET


I'm returning a Character vector from a function in R to C# using R.NET. The only problem is that unicode characters, such as Greek Letters are being lost. The following line gives an example of the code I'm using:

CharacterVector cvAll = results[5].AsList().AsCharacter();

Where results is a list of results returned by the R function. The characters are also written by R to a text file and they display fine in notepad and other editors. Can I get R.Net to return the characters correctly?


Solution

  • Looks like you ran into an open issue with RDotNet : https://github.com/jmp75/rdotnet/issues/25

    Unicode characters don't seem to be supported yet. I ran into the same issue while calling the engine.CreateDataFrame() method. It did return a DataFrame with all my accentuated strings wrong.

    There seems to be a workaround though : when calling RDotNet functions, if I give strings encoded in my computer default encoding (Windows ANSI) and converted from UTF-8 (important), R takes them and gives back correctly interpreted accentuated strings to C#. I don't exactly know why it is working though... It might have something to do with the default encoding used with .Net for string being UTF-16. (cf. here : http://csharpindepth.com/Articles/General/Strings.aspx), hence the conversion from UTF-8 to default ANSI that seems to be working.

    Here is an ugly example : when I'm building a RDotNet DataFrame, I convert all strings in a CharacterVector to ANSI (from UTF-8) encoded ones :

    try 
    {
        string[] colAsStrings = null;
        colAsStrings = Array.ConvertAll<object, string>(uneColonne, s => StringEncodingHelper.EncodeToDefaultFromUTF8((string)s));
        correctedDataArray[i] = colAsStrings;
        columnConverted = true;
    }
    

    Here is the static method used for conversion :

    public static string EncodeToDefaultFromUTF8(string stringToEncode)
    {
        byte[] utf8EncodedBytes = Encoding.UTF8.GetBytes(stringToEncode);
    
        return Encoding.Default.GetString(utf8EncodedBytes);
    }