Search code examples
javajsonjacksonjackson-databind

Why did I have to use `ESCAPE_NON_ASCII` to parse µ?


From service Sender, I'm sending a JSON with the character µ (UTF-8 code: 0xb5) inside one of the strings, like so:

{
   "apiserver.latency.k8s.io/response-write":"1.446µs"
}

I'm including the content-type header as "application/json;charset=UTF-8" in the request.

This string is initially held in POJO, so in order to send the payload to Receiver service I'm using Jackson to convert it to String, like so:

// myPojo is an object containing the `µ` in one of its members 
String body = objectMapper.writeValueAsString(myPojo)
ResponseEntity<String> response = restClient.postForEntity(url, body,
                    String.class);

However, the Receiver service fails the request after throwing the following exception:

com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xb5

If I configure the ObjectMapper with the following:

objectMapper.getFactory().configure(JsonWriteFeature.ESCAPE_NON_ASCII.mappedFeature(), true);

The Receiver service now successfully parses the request, and lets it through.

However from what I've read, Jackson always produces valid JSON. So how come I have to enable the above setting for the request to go through? Further, it seems µ is a valid UTF-8 character, which is Jackson's default encoding, so why is it failing?

This is how I'm reading the payload in a custom deserializer:

public class JsonToStringValueDeserializer extends JsonDeserializer<String> {

    private static final TypeReference<JsonNode> JSON_NODE_TYPE = new TypeReference<>(){};

    @Override
    public String deserialize(JsonParser p, DeserializationContext ctx) throws IOException, JsonProcessingException {
        String retVal;

        if (p.currentToken().isStructStart()) {
            JsonNode val = p.readValueAs(JSON_NODE_TYPE);
            retVal = val.toString();
        } else {
            retVal = p.readValueAs(String.class);
        }

        return retVal;
    }
}

This is because I want this part of the payload to be deserialized as String in Receiver, regardless of how the sender initially sent it (we have another Sender2 sending this part of the payload as a base64 string).


Solution

  • user g00se's comments led me in the right direction: Our sending service is based on Spring, and our version/as of today Spring’s default StringHttpMessageConverter encodes the payload in ISO-8859-1.