Search code examples
jsonnode.jswatsonwatson-discovery

How to index a JSON document directly from memory


I'm trying to index a JSON document, but it simply isn't working; so far, I have tried the solutions posted in https://developer.ibm.com/answers/questions/361808/adding-a-json-document-to-a-discovery-collection-u/ , but it simply does not work;

If I try:

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: JSON.stringify({
            "ocorrencia_id": 9001
        })
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

It returns me this error:

    The Media Type [text/plain] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .

On the other hand, if I try:

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: JSON.parse(JSON.stringify({
            "ocorrencia_id": 9001
        }))
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

I get this error:

TypeError: source.on is not a function
    at Function.DelayedStream.create (C:\Temp\teste-watson\watson-orchestrator\node_modules\delayed-stream\lib\delayed_stream.js:33:10)
    at FormData.CombinedStream.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\combined-stream\lib\combined_stream.js:43:37)
    at FormData.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\form-data\lib\form_data.js:68:3)
    at appendFormValue (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:324:21)
    at Request.init (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:337:11)
    at new Request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:130:8)
    at request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\index.js:54:10)
    at createRequest (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:177:10)
    at DiscoveryV1.addDocument (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\discovery\v1.js:516:10)
    at client.query.then.res (C:\Temp\teste-watson\watson-orchestrator\populate\populate.js:36:13)
    at process._tickCallback (internal/process/next_tick.js:109:7)

Similarly, by saving to a temp file, and then using it:

    const tempy = require('tempy');
    const f = tempy.file({extension: 'json'});
    fs.writeFileSync(f, JSON.stringify({
            "ocorrencia_id": 9001
    }));

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: fs.readFileSync(f)
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

Then this happens:

The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .

Considering other posts recommend using JSON.parse(), it seems that the API accepts a JS object, but none of the examples, and nothing that I have tried so far seems to be working. Seems to be a bug?

Update: by saving into a temp file and then using createDataStream(), instead of readFileSync(), it works, but it is still a big bother having to got through the disk for an information that's already on memory.

I have also tried to create a in-memory stream from a Readable, but that fails, too:

    var Readable = require('stream').Readable;
    var s = new Readable();
    s._read = function noop() {}; // redundant? see update below
    s.push(JSON.stringify({
            "ocorrencia_id": 9001
    }));        
    s.push(null);

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: s
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

This one fails with:

Error: Unexpected end of multipart data
    at Request._callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:88:15)
    at Request.self.callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:188:22)
    at emitTwo (events.js:106:13)
    at Request.emit (events.js:191:7)
    at Request.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1171:10)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at Gunzip.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1091:12)
    at Gunzip.g (events.js:292:16)
    at emitNone (events.js:91:20)
    at Gunzip.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:974:12)
    at _combinedTickCallback (internal/process/next_tick.js:80:11)
    at process._tickCallback (internal/process/next_tick.js:104:9) code: 500, error: 'Unexpected end of multipart data'

Solution

  • The service checks filename and then content to determine type, but doesn't seem to recognize JSON content correctly - it just sees text. The other answer will work, as long as the filename ends in .json (it does not care about the contentType).

    However, we added .addJsonDocument() and .updateJsonDocument() methods to the node.js SDK to make it even easier:

    discovery.addJsonDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
    
        // note: no JSON.stringify needed with addJsonDocument()
        file: { 
            "ocorrencia_id": 9001
        }
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }
    
        console.log(data);
    });