I'm using IBM Speech to Text. The results are OK, but I'm wondering why they are not sorted by highest confidence first. Is there a parameter returning this sorted, so that I could just pick the first alternative? Best would be to only return a result if also the passed keyword is also found.
There is a max_alternatives
parameter defaulting to 1, but also when specifying this explicitly, more than one alternative is returned.
I'm currently sorting the response manually and I need no code sample for accomplishing this.
JSON example:
"result": {
"result_index": 0,
"results": [
{
"final": true,
"alternatives": [
{
"transcript": "l\u00f6schen es tut echte betroffen ",
"confidence": 0.71
}
],
"keywords_result": {}
},
{
"final": true,
"alternatives": [
{
"transcript": "sie sp\u00fcren dass eine \u00e4ra zu ende ",
"confidence": 0.91
}
],
"keywords_result": {}
},
{
"final": true,
"alternatives": [
{
"transcript": "auto fahre eins zwei drei vier ",
"confidence": 0.95
}
],
"keywords_result": {
"auto": [
{
"start_time": 6.31,
"end_time": 7.19,
"confidence": 0.99,
"normalized_text": "auto"
}
]
}
}
]
},
...
The issue was the end_of_phrase_silence_time
. When a default 0.8 silence period is detected, the speech is split into an additional phrase. So what I have seen is not a different recognition result, but an existing phrase in the audio recording mentioned before. See the parameter end_of_phrase_silence_time