NodeJS app using ffmpeg to create ogg files from mp3 & mp4. If the source file is broadband, Watson Speech to Text accepts the file with no issues. If the source file is narrow band, Watson Speech to Text fails to read the ogg file. I've tested the output from ffmpeg and the narrowband ogg file has the same audio content (e.g. I can listen to it and hear the same people) as the mp3 file. Yes, in advance, I am changing the call to Watson to correctly specify the model and content_type. Code follows:
exports.createTranscript = function(req, res, next)
{ var _name = getNameBase(req.body.movie);
var _type = getType(req.body.movie);
var _voice = (_type == "mp4") ? "en-US_BroadbandModel" : "en-US_NarrowbandModel" ;
var _contentType = (_type == "mp4") ? "audio/ogg" : "audio/basic" ;
var _audio = process.cwd()+"/HTML/movies/"+_name+'ogg';
var transcriptFile = process.cwd()+"/HTML/movies/"+_name+'json';
speech_to_text.createSession({model: _voice}, function(error, session) {
if (error) {console.log('error:', error);}
else
{
var params = { content_type: _contentType, continuous: true,
audio: fs.createReadStream(_audio),
session_id: session.session_id
};
speech_to_text.recognize(params, function(error, transcript) {
if (error) {console.log('error:', error);}
else
{ fs.writeFile(transcriptFile, JSON.stringify(transcript), function(err) {if (err) {console.log(err);}});
res.send(transcript);
}
});
}
});
}
_type
is either mp3 (narrowband from phone recording) or mp4 (broadband)
model: _voice
has been traced to ensure correct setting
content_type: _contentType
has been traced to ensure correct setting
Any ogg file submitted to Speech to Text with narrowband settings fails with Error: No speech detected for 30s.
Tested with both real narrowband files and asking Watson to read a broadband ogg file (created from mp4) as narrowband. Same error message. What am I missing?
The documentation for Watson Speech to Text is confusing on this point. The documentation here indicates that when using the narrowband model, that content_type
should be set to audio/basic
. That's incorrect. In this example, the inbound audio file is a narrowband file, but it's an ogg file, so content_type
should still be audio/ogg
. That single change resolves the problem.