Search code examples
javascriptc#json.net-4.0fastjson

Most efficient way to fix an invalid JSON


I am stuck in an impossible situation. I have a JSON from outer space (there is no way they are going to change it). Here is the JSON

{
    user:'180111',
    title:'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n\'',
    date:'2007/01/10 19:48:38',
    "id":"3322121",
    "previd":112211,
    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
    "from":"112221",
    "username":"mikethunder",
    "creationdate":"2007\/01\/10 14:04:49"
}

"It is nowhere near a valid JSON",I said. And their response was "emmm! but Javascript can read it without complain":

<html>
<script type="text/javascript">
    var obj = {"PUT JSON FROM UP THERE HERE"};

    document.write(obj.title);
    document.write("<br />");
    document.write(obj.creationdate + " " + obj.date);
    document.write("<br />");
    document.write(obj.body);
    document.write("<br />");
</script>
<body>
</body>
</html>

Problem

I am supposed to read and parse this string via .NET(4) and it broke 3 out of 14 library mentioned in C# section of Json.org (didn't try rest of them). To make the problem go away, I wrote following function to fix the issue with single and double quotes.

public static string JSONBeautify(string InStr){
    bool inSingleQuote = false;
    bool inDoubleQuote = false;
    bool escaped = false;

    StringBuilder sb = new StringBuilder(InStr);
    sb = sb.Replace("`", "<°)))><"); // replace all instances of "grave accent" to "fish" so we can use that mark later. 
                                        // Hopefully there is no "fish" in our JSON
    for (int i = 0; i < sb.Length; i++) {
        switch (sb[i]) {

            case '\\':
                if (!escaped)
                    escaped = true;
                else 
                    escaped = false;
                break;
            case '\'':
                if (!inSingleQuote && !inDoubleQuote) {
                    sb[i] = '"';            // Change opening single quote string markers to double qoute
                    inSingleQuote = true;
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '"';            // Change closing single quote string markers to double qoute
                    inSingleQuote = false;
                } else if (escaped) {
                    escaped = false;
                }
                break;
            case '"':
                if (!inSingleQuote && !inDoubleQuote) {
                    inDoubleQuote = true;   // This is a opening double quote string marker
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '`';            // Change unescaped double qoute to grave accent
                } else if (inDoubleQuote && !escaped) {
                    inDoubleQuote = false; // This is a closing double quote string marker
                } else if (escaped) {
                    escaped = false;
                }
                break;
            default:
                escaped = false;
                break;
        }
    }
    return sb.ToString()
        .Replace("\\/", "/")        // Remove all instances of escaped / (\/) .hopefully no smileys in string
        .Replace("`", "\\\"")       // Change all "grave accent"s to escaped double quote \"
        .Replace("<°)))><", "`")   // change all fishes back to "grave accent"
        .Replace("\\'","'");        // change all escaped single quotes to just single quote
}

Now JSONlint only complains about attribute names and I can use both JSON.NET and SimpleJSON libraries to parse above JSON.

Question

I am sure my code is not the best way of fixing mentioned JSON. Is there any scenario that my code might break? Is there a better way of doing this?


Solution

  • You need to run this through JavaScript. Fire up a JavaScript parser in .net. Give the string as input to JavaScript and use JavaScript's native JSON.stringify to convert:

    obj = {
        "user":'180111',
        "title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',
        "date":'2007/01/10 19:48:38',
        "id":"3322121",
        "previd":"112211",
        "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
        "from":"112221",
        "username":"mikethunder",
        "creationdate":"2007\/01\/10 14:04:49"
    }
    
    console.log(JSON.stringify(obj));
    document.write(JSON.stringify(obj));

    Please remember that the string (or rather object) you've got isn't valid JSON and can't be parsed with a JSON library. It needs to be converted to valid JSON first. However it's valid JavaScript.

    To complete this answer: You can use JavaScriptSerializer in .Net. For this solution you'll need the following assemblies:

    • System.Net
    • System.Web.Script.Serialization

      var webClient = new WebClient();
      string readHtml = webClient.DownloadString("uri to your source (extraterrestrial)");
      var a = new JavaScriptSerializer();
      
      Dictionary<string, object> results = a.Deserialize<Dictionary<string, object>>(readHtml);