Search code examples
javaunicodegson

Can someone clarify Gson's unicode encoding?


In the following minimalistic example:

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;

public class GsonStuff {

    public static void main(String[] args) {
        GsonBuilder builder = new GsonBuilder();
        Gson gson = builder.create();
        System.out.println(gson.toJson("Apostrophe: '"));
        //Outputs: "Apostrophe: \u0027"
    }   
}

The apostrophe gets replaced by it's unicode representation in the printout. However, the String returned from the toJson method literally has the characters '\', 'u', '0', '0', '2', '7'.

Decoding it with json actually works and gives the string "Apostrophe: '" as opposed to "Apostrophe: \u0027". How should I decode it to get the same result?

And an additional question, why doesn't a random unicode character such as ش get encoded similarly?


Solution

  • By default, gson Unicode escapes certain characters, of which ' is one. (See HTML_SAFE_REPLACEMENT_CHARS in JsonWriter for the complete list.)

    To disable this, do

    builder.disableHtmlEscaping();