6 second mp3 audio file(download) First tested directly on https://cloud.google.com/speech-to-text/ and the response was as expected.
"hello brother how are you doing I'm doing really well hope mom is doing well"
Then I created firebase Function(see code below):
const speech = require('@google-cloud/speech').v1p1beta1
exports.speechToText = functions.https.onRequest(async (req, res) => {
try {
// Creates a client
const client = new speech.SpeechClient()
const gcsUri = `gs://xxxxx.appspot.com/speech.mp3`
const config = {
encoding: 'MP3',
languageCode: 'en-US',
enableAutomaticPunctuation: false,
enableWordTimeOffsets: false,
}
const audio = {
uri: gcsUri,
}
const request = {
config: config,
audio: audio,
}
// Detects speech in the audio file
const [response] = await client.recognize(request)
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n')
console.log(`Transcription: ${transcription}`)
res.send({ response })
} catch (error) {
console.log('error:', error)
res.status(400).send({
error,
})
}
})
And I get the following INCORRECT response:
"hello brother, how are you doing hope all is doing well"
UPDATE: The same INCORRECT response is received when running Locally. So Cloud Functions are not the issue.
UPDATE #2:
setting the model:'video'
OR model:'phone_call'
in config
solved the issue. i.e
const config = {
encoding: 'MP3',
languageCode: 'en-US',
enableAutomaticPunctuation: false,
enableWordTimeOffsets: false,
model: 'phone_call',
}
setting the model:'video'
OR model:'phone_call'
in config
solved the issue. i.e
const config = {
encoding: 'MP3',
languageCode: 'en-US',
enableAutomaticPunctuation: false,
enableWordTimeOffsets: false,
model: 'phone_call',
}
I suppose the default
model doesn't work on certain type of audio.