Search code examples
rustunicodeescapingserde-json

How to correctly parse JSON with Unicode escape sequences?


playground

use serde_json::json; // 1.0.66
use std::str;

fn main() {
    let input = "{\"a\": \"b\\u001fc\"}";
    let bytes = input.as_bytes();
    let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
    for (_k, v) in json.as_object().unwrap() {
        let vec = serde_json::to_vec(v).unwrap();
        let utf8_str = str::from_utf8(&vec).unwrap();
        println!("value: {}", v);
        println!("utf8_str: {}", utf8_str);
        println!("bytes: {:?}", vec);
    }
}

How can the value of object key "a" be transformed into the following string?

b\u{1f}c

I've tried with serde_json and str::from_utf8, but I always get "b\u001fc" as the result. The escaped character sequence is not interpreted correctly. How this can be solved?


Solution

  • The problem is this line:

    let vec = serde_json::to_vec(v).unwrap();
    

    From the serde_json docs on to_vec():

    Serialize the given data structure as a JSON byte vector.

    You are deserializing from JSON, getting the values of the object, serializing them back to JSON and printing that. You don't want to serialize back to JSON, you want to print the "raw" string, so something like this does what you want:

    fn main() {
        let input = "{\"a\": \"b\\u001fc\"}";
        let bytes = input.as_bytes();
        let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
        for (_k, v) in json.as_object().unwrap() {
            let string = v.as_str().unwrap();
            println!("bytes: {:?}", string);
        }
    }
    

    Playground