Search code examples
node.jsgoogle-search-api

Google Custom Search API returning invalid JSON?


I'm calling Google Custom Search w/ Node.js to try. I'm getting results back just fine, but when I try and parse the JSON using JSON.parse(dataFromGoogle), I get illegal token errors on a number of elements (html titles and snippets; the html titles have unicode escape sequences in them, but I'm not sure what's wrong w/ the snippets). I can have Google not send me back the html titles, but I really need the snippets!

Is there a good work-around on this, or should I just plan on doing some additional preprocessing to strip out illegal characters manually?

** Edit: added console output from this

searching for "small business" using Google

{ "kind": "customsearch#search", "url": { "type": "application/json", "template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&hr={language?}&safe={safe?}&cx={cx?}&cref={cref?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&alt=json" }, "queries": { "nextPage": [ { "title": "Google Custom Search - small business", "totalResults": "42300", "searchTerms": "small business", "count": 10, "startIndex": 11, "inputEncoding": "utf8", "outputEncoding": "utf8", "safe": "off", "cx": "my_token" } ], "request": [ { "title": "Google Custom Search - small business", "totalResults": "42300", "searchTerms": "small business", "count": 10, "startIndex": 1, "inputEncoding": "utf8", "outputEncoding": "utf8", "safe": "off", "cx": "my_token" } ] }, "context": { "title": "IR undefined:60 "htmlTitle": "\u003cb\u003eSmall Business\u003c/b\u003e Health Care Tax Cre ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SyntaxError: Unexpected token ILLEGAL at Object.parse (native) at IncomingMessage. (/Users/pvencill/workspace/irslab/lib/searchEngine.js:44:35) at IncomingMessage.emit (events.js:64:17) at HTTPParser.onBody (http.js:119:42) at CleartextStream.ondata (http.js:1213:22) at CleartextStream._push (tls.js:291:27) at SecurePair._cycle (tls.js:565:20) at EncryptedStream.write (tls.js:97:13) at Socket.ondata (stream.js:40:26) at Socket.emit (events.js:64:17)


Solution

  • Wow, so it turns out I totally misunderstood what the error was telling me. The fact that it was happening on unicode-containing fields was coincidence. The real issue was that I was calling JSON.parse inside the .on("data", ...) handler, which is handling part of the chunked response; might not be valid JS statement terminator before the chunk completes. The proper way to handle it is to build the body and then use on("end") to parse it.

            var message = "";
            https.get(options, function(res){
                res.setEncoding('utf8');
                res.on('data', function(data){
                    message += data;
                });
    
                res.on('end', function(){
                    if(callback){
                        var data = JSON.parse(message);
                        data.items = data.items || [];
                        callback(data);
                    }
                });
    
                res.on('error', function(error){
                    console.log("ERROR" + error.message);
                });