Search code examples
c#google-apijson.netjsonp

How to parse malformed JSONP with hex-encoded characters using JSON.NET?


I make a call to google's dictionary api like this:

var json = new WebClient().DownloadString(string.Format(@"http://www.google.com/dictionary/json?callback=dict_api.callbacks.id100&q={0}&sl=en&tl=en", "bar"));

However I get a response that this code fails to parse correctly:

json = json.Replace("dict_api.callbacks.id100(", "").Replace(",200,null)", "");
JObject o = JObject.Parse(json);

The parse dies at encountering this:

"entries":[{"type":"example","terms":[{"type":"text","text":"\x3cem\x3ebars\x3c/em\x3e of sunlight shafting through the broken windows","language":"en"}]}]}

The

\x3cem\x3ebars\x

stuff kills the parse

Is there some way to handle this JSONP response with JSON.NET?

The answer by aquinas to another "Parse JSONP" question shows nice regex x = Regex.Replace(x, @"^.+?\(|\)$", ""); to handle with JSONP part (may need to tweak regex for this case), so main part here is how to deal with hex-encoded characters.


Solution

  • Reference: How to decode HTML encoded character embedded in a json string

    JSON specs for strings do not allow hexadecimal ASCII escape-sequences, but only Unicode escape-sequences, which is why the escape sequence is unrecognized and which is why using \u0027 instead should work ... now you could blindly replace \x with \u00 (this should perfectly work on valid JSON, although some comments may get damaged in theory, but who cares ... :D)

    So change your code to this will fix it:

            var json = new WebClient().DownloadString(string.Format(@"http://www.google.com/dictionary/json?callback=dict_api.callbacks.id100&q={0}&sl=en&tl=en", "bar"));
    
            json = json
                    .Replace("dict_api.callbacks.id100(", "")
                    .Replace(",200,null)", "")
                    .Replace("\\x","\\u00");
    
            JObject o = JObject.Parse(json);