Search code examples
openai-whisper

transformers.js with whisper and return_timestamps


I am new to both transformers.js and whisper trying to make return_timestamps parameter work...

I managed to customize script.js from transformer.js demo locally and added data.generation.return_timestamps = "char"; around line ~447 inside GENERATE_BUTTON click handler in order to pass the parameter. With that change in place I am seeing timestamp appears as chunks (result in worker.js):

{
    "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
    "chunks": [
        {
            "timestamp": [0,8],
            "text": " And so my fellow Americans ask not what your country can do for you"
        },
        {
            "timestamp": [8,11],
            "text": " ask what you can do for your country."
        }
    ]
}

however the chunks are not "char level" granular as expected following the return_timestamps doc.

I am looking for ideas how to achieve char/word level timestamp granularity with transform.js and whisper. Do some models/tools need to be updated and/or rebuild?


Solution

  • Creator of transformers.js here. Yesterday, I added support for word-level timestamps (v2.4.0). You can use it as follows:

    import { pipeline } from '@xenova/transformers';
    
    let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
    let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
        revision: 'output_attentions',
    });
    let output = await transcriber(url, { return_timestamps: 'word' });
    // {
    //   "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
    //   "chunks": [
    //     { "text": " And", "timestamp": [0, 0.78] },
    //     { "text": " so", "timestamp": [0.78, 1.06] },
    //     { "text": " my", "timestamp": [1.06, 1.46] },
    //     ...
    //     { "text": " for", "timestamp": [9.72, 9.92] },
    //     { "text": " your", "timestamp": [9.92, 10.22] },
    //     { "text": " country.", "timestamp": [10.22, 13.5] }
    //   ]
    // }