node.js audio ffmpeg speech-to-text ibm-watson

IBM Watson Speech to Text Audio Conversion on Node.js Web Application

The gist of the issue is that IBM Watson Speech to Text only allows for FLAC, WAV, and OGG file formats to be uploaded and used with the API.

My solution to that would be that if the user uploads an mp3, BEFORE sending the file to Watson, a data conversion would take place. Essentially, the user uploads an mp3, then using ffmpeg or sox the audio would be converted to an OGG, after which the audio would then be uploaded to Watson.

What I am unsure about is: What exactly do I have to modify in the Node.js Watson code to allow for the audio conversion to happen? Linked below is the Watson repo which is what I am working through. I am sure that the file that will have to be changes is fileupload.js, which I have linked, but where the changes go is what I am uncertain about?

I have looked through both SO and developerWorks, the IBM SO for answers to this issue, but I have not seen any which is why I am posting here. I would be happy to clarify my question if that is necessary.

Watson Speech to Text Repo

Solution

The Speech to Text sample application you are trying to use doesn't convert MP3 files to OGG. The src folder(with fileupload.js on it) is just javascript that will be used on the client side(thanks to Browserify).

The application is basically communicating the browser with the API using CORS so the audio goes from the browser to the Watson API.

If you want to convert the audio using ffmpeg or sox you will need to install the dependencies using a custom buildpack since those modules have binary dependencies (C++ code in them) James Thomas has a buildpack with sox on it: https://github.com/jthomas/nodejs-buildpack. You need to update your manifest.yml to be something like:

memory: 256M 
buildpack: https://github.com/jthomas/nodejs-buildpack.git
command: npm start

Node:

var sox = require('sox');

var job = sox.transcode('audio.mp3', 'audio.ogg', {
  sampleRate: 16000,
  format: 'ogg',
  channelCount: 2,
  bitRate: 192 * 1024,
  compressionQuality: -1
});