Search code examples
c#character-encodingrtf

Encoding issue with French language characters when creating RTF document using .NET/C#


The app is developed in .NET and reads an RTF document template that contains placeholders that require replacing with text currently stored in a SQL Server database. The app then saves the RTF doc with the substituted text. However, French characters read from the database, such as é are being displayed as é in the RTF document.

The process is:

  1. read the RTF doc
  2. replace the placeholders with data from SQL Server db
  3. save to new RTF doc

The key bits of the code I think are...

Read from RTF doc:

StringBuilder buffer;
using (StreamReader input = new StreamReader(pathToTemplate))
{
    buffer = new StringBuilder(input.ReadToEnd());
}

Replace placeholder text with text from database:

buffer.Replace("$$placeholder$$", strFrenchCharsFromDb);

Save the edits as a new RTF doc:

byte[] fileBytes = System.Text.Encoding.UTF8.GetBytes(buffer.ToString());

File.WriteAllBytes(pathToNewRtfDoc, fileBytes);

When I debug buffer during "Save" the é character is present. When I open the RTF after File.WriteAllBytes it contains é instead.

I have tried specifying the encoding when creating the StreamReader but it was the same result. i.e. using (StreamReader input = new StreamReader(pathToTemplate, Encoding.UTF8))


Solution

  • Apply the following method on the strFrenchCharsFromDb string before caling the Replace():

    buffer.Replace("$$placeholder$$", ConvertNonAsciiToEscaped(strFrenchCharsFromDb)); 
    

    The ConvertNonAsciiToEscaped() method implementation:

    /// <param name="rtf">An RTF string that can contain non-ASCII characters and should be converted to correct format before loading to the RichTextBox control.</param>
    /// <returns>The source RTF string with converted non ASCII to escaped characters.</returns>
    
    public string ConvertNonAsciiToEscaped(string rtf)
    {
        var sb = new StringBuilder();
        foreach (var c in rtf)
        {
            if (c <= 0x7f)
                sb.Append(c);
            else
                sb.Append("\\u" + Convert.ToUInt32(c) + "?");
        }
        return sb.ToString();
    }