Search code examples
c#asp.netjsonjson.netsystem.text.json

Issues with System.Text.Json serializing Unicode characters (like emojis)


I am upgrading an application from .NET Core 2.2 to .NET Core 3.0, and the new System.Text.Json serializer is not behaving the same as Newtonsoft did in 2.2. On characters like a non-breaking-space (\u00A0) or emoji characters, Newtonsoft (and even Utf8Json) serialize them as their actual characters, not the Unicode code.

I've created a simple .NET Fiddle to show this.

var input = new Foo { Bar = "\u00A0 Test !@#$%^&*() 💯\uD83D\uDCAF 你好" };
var newtonsoft = Newtonsoft.Json.JsonConvert.SerializeObject(input);
var system = System.Text.Json.JsonSerializer.Serialize(input, new System.Text.Json.JsonSerializerOptions
    {
        Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping, 
    });
var utf8Json = Utf8Json.JsonSerializer.ToJsonString(input);

Console.WriteLine($"Original: {input.Bar} - {input.Bar.Contains('\u00A0')}"); // Original
Console.WriteLine($"Newtonsoft: {newtonsoft} - {newtonsoft.Contains('\u00A0')}"); // Works
Console.WriteLine($"System.Text.Json: {system} - {system.Contains('\u00A0')}"); // Does not work
Console.WriteLine($"Utf8Json: {utf8Json} - {utf8Json.Contains('\u00A0')}"); // Works

https://dotnetfiddle.net/erCaZl

Is there an Encoder or a JsonSerializerOptions property to serialize like Newtonsoft did?


Solution

  • This is by-design. Our goal is to ship secure defaults, which is why we escape anything that we don't know for a fact is safe. For practical reasons, we can't detect all safe characters because that would mean us shipping large tables and perform potentially non-trivial lookups.

    If you really insist, you can extend the JavaScriptEncoder class and choose the encoded characters yourself. I would advise against this because if you're not careful people can sneak in payloads that might change the semantics of the JSON.