Search code examples
pythonpython-asyncio

Store value of a callback in Python using asyncio


I am trying to use Deepgram streaming speech recognition for a project. I can stream the transcribed text to the console using their quickstart demo code, but the text is printed from within a callback function. I would like to get the individual chunks of transcribed text out of the callback into a single string (or an array or whatever) so I can format longer pieces of the transcription before printing it.

Seems like a similar problem as [https://stackoverflow.com/a/66279927](this question), but I think my situation needs to be treated differently due to asyncio (or something else I am not understanding)

This works, but just dumps each little piece of transcribed text to the console:

from deepgram import Deepgram
import asyncio
import aiohttp

DEEPGRAM_API_KEY = '****'
URL = 'http://stream.live.vc.bbcmedia.co.uk/bbc_world_service'

async def main():
  deepgram = Deepgram(DEEPGRAM_API_KEY)

  # Create a websocket connection to Deepgram
  deepgramLive = await deepgram.transcription.live({ 'language': 'en-US' })

  # Listen for the connection to close
  deepgramLive.registerHandler(deepgramLive.event.CLOSE, lambda c: print(f'Connection closed with code {c}.'))

  # Listen for any transcripts received from Deepgram and write them to the console
  deepgramLive.registerHandler(deepgramLive.event.TRANSCRIPT_RECEIVED, print_transcript) # using anything more complex/persistent than print_transcript here throws 'raise AttributeError(name) from None' error

  # Listen for the connection to open and send streaming audio from the URL to Deepgram
  async with aiohttp.ClientSession() as session:
    async with session.get(URL) as audio:
      while True:
        data = await audio.content.readany()
        deepgramLive.send(data)

        # do more with the transcribed chunks here?

        if not data:
            break
  await deepgramLive.finish()


def print_transcript(json_data):
   print(json_data['channel']['alternatives'][0]['transcript'])


asyncio.run(main())

I tried using a class with a __call__ method as in the other question and I tried messing with asyncio.Queue, but I'm missing something.


Solution

  • Their Python documentation is horrendous, so we have to check the source code. But it seems the LiveTranscription.register_handler method expects the handler argument to be of type EventHandler as defined here. That is just a function that can be called with one argument of any type and that returns None or an equivalent coroutine function.

    This is still very badly typed because we have absolutely no idea what type of object this handler will receive in general. But judging from your code with that print_transcript function, you seem to be expecting a dictionary (or something similar).

    If you want to store those objects rather than just printing and discarding, you have many options. One would be to write a handler function that takes some sort of data structure (a list for example) as an additional argument and stores those objects in that data structure instead of printing them, then use functools.partial in your main function to pre-bind such a storage object to that function argument before passing the partially initialized function to register_handler.

    Something like this:

    from functools import partial
    from typing import Any
    
    
    def store_data(data: Any, storage: list[Any]) -> None:
        storage.append(data)
    
    
    async def main() -> None:
        ...
        storage = []
        handler = partial(store_data, storage=storage)
        deepgram_live.register_handler(deepgram_live.event.TRANSCRIPT_RECEIVED, handler)
    

    Another almost equivalent option would be to define that handler function inside the main function and provide it access to a storage object from within that main function's scope:

    from typing import Any
    
    async def main() -> None:
        ...
        storage = []
    
        def store_data(data: Any) -> None:
            storage.append(data)
        
        deepgram_live.register_handler(deepgram_live.event.TRANSCRIPT_RECEIVED, store_data)
    

    You could indeed use an asyncio.Queue instead of a simple list if you want, but the principles of how you make the handler function access that queue object are still the same.

    I don't use Deepgram, so I have not tested this, but at least from what I could gather from the poor documentation, the source, and your example, I think this should work.