Search code examples
node.jsffmpegspeech-to-textwatson

Watson NarrowBand Speech to Text not accepting ogg file


NodeJS app using ffmpeg to create ogg files from mp3 & mp4. If the source file is broadband, Watson Speech to Text accepts the file with no issues. If the source file is narrow band, Watson Speech to Text fails to read the ogg file. I've tested the output from ffmpeg and the narrowband ogg file has the same audio content (e.g. I can listen to it and hear the same people) as the mp3 file. Yes, in advance, I am changing the call to Watson to correctly specify the model and content_type. Code follows:

exports.createTranscript = function(req, res, next)
{ var _name = getNameBase(req.body.movie);
  var _type = getType(req.body.movie);
  var _voice = (_type == "mp4") ? "en-US_BroadbandModel" : "en-US_NarrowbandModel" ;
  var _contentType = (_type == "mp4") ? "audio/ogg" : "audio/basic" ;
  var _audio = process.cwd()+"/HTML/movies/"+_name+'ogg';
  var transcriptFile = process.cwd()+"/HTML/movies/"+_name+'json';

  speech_to_text.createSession({model: _voice}, function(error, session) {
    if (error) {console.log('error:', error);}
    else
      {
        var params = { content_type: _contentType, continuous: true,
         audio: fs.createReadStream(_audio),
          session_id: session.session_id
          };
          speech_to_text.recognize(params, function(error, transcript) {
            if (error) {console.log('error:', error);}
            else
              { fs.writeFile(transcriptFile, JSON.stringify(transcript), function(err) {if (err) {console.log(err);}});
                res.send(transcript);
              }
          });
      }
  });
}

_type is either mp3 (narrowband from phone recording) or mp4 (broadband) model: _voice has been traced to ensure correct setting content_type: _contentType has been traced to ensure correct setting

Any ogg file submitted to Speech to Text with narrowband settings fails with Error: No speech detected for 30s. Tested with both real narrowband files and asking Watson to read a broadband ogg file (created from mp4) as narrowband. Same error message. What am I missing?


Solution

  • The documentation for Watson Speech to Text is confusing on this point. The documentation here indicates that when using the narrowband model, that content_type should be set to audio/basic. That's incorrect. In this example, the inbound audio file is a narrowband file, but it's an ogg file, so content_type should still be audio/ogg. That single change resolves the problem.