Search code examples
c++boostvoice-recognition

Kaldi - how to share language model among multiple decoders?


I'm using Kaldi for decoding lots of audio samples every day. I have a plan that there would be multiple decoders running in parallel doing decoding on the same language model. For this it would be nice if I could share one language model that is loaded into memory by multiple decoders. Model I have right now is 1GB on disc and uses around 3GB in the memory and it would be great if I can save the memory by using it once again.

Have anyone ever thought about such thing? Is it doable?

  • I have not found anything about it in Kaldi documentation
  • I was thinking to use boost::interprocess library to manage the object fst::VectorFst fst::ReadFstKaldi() as this is the biggest object. But this looks like a big issue, as it is a complex custom object and I'm not sure if boost::interprocess can handle those. I don't want to go into customizing the Kaldi objects to have them supported by boost memory sharing.

Any other ideas about this approach?


Solution

  • You do not need multiple processes, you just share the fst object across threads. It's constant, so there is no need to protect it. You create decoder with fst pointer in every worker, decoders are separate for every thread. You can use io_service for processing requests.