I am new to both transformers.js and whisper trying to make return_timestamps
parameter work...
I managed to customize script.js from transformer.js demo locally and added data.generation.return_timestamps = "char";
around line ~447 inside GENERATE_BUTTON click handler in order to pass the parameter. With that change in place I am seeing timestamp appears as chunks (result
in worker.js):
{
"text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
"chunks": [
{
"timestamp": [0,8],
"text": " And so my fellow Americans ask not what your country can do for you"
},
{
"timestamp": [8,11],
"text": " ask what you can do for your country."
}
]
}
however the chunks are not "char level" granular as expected following the return_timestamps
doc.
I am looking for ideas how to achieve char/word level timestamp granularity with transform.js and whisper. Do some models/tools need to be updated and/or rebuild?
Creator of transformers.js here. Yesterday, I added support for word-level timestamps (v2.4.0). You can use it as follows:
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
// "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
// "chunks": [
// { "text": " And", "timestamp": [0, 0.78] },
// { "text": " so", "timestamp": [0.78, 1.06] },
// { "text": " my", "timestamp": [1.06, 1.46] },
// ...
// { "text": " for", "timestamp": [9.72, 9.92] },
// { "text": " your", "timestamp": [9.92, 10.22] },
// { "text": " country.", "timestamp": [10.22, 13.5] }
// ]
// }