Search code examples
winapiaudiospeech-recognitionspeech-to-textsapi

How to use ISpStreamFormatConverter?


I am trying to convert raw PCM stream collected from a microphone (48,000 Hz) to a wave format (44,100 Hz) that ISpRecognizer will recognize (it returns AUDCLNT_E_UNSUPPORTED_FORMAT from SetRecoState(SPRST_ACTIVE_ALWAYS) for a 48,000 Hz PCM stream but works fine for a 44,100 Hz WAV file).

I create an instance of the ISpStreamFormatConverter interface, supply it with my existing stream using ISpStreamFormatConverter::SetBaseStream() passing my own implementation of ISpStreamFormat that sits on top of an existing IStream. ISpStreamFormatConverter successfully calls my implementation of ISpStreamFormat::GetFormat, but when I call ISpStreamFormatConverter::RemoteRead() or ISpStreamFormatConverter::RemoteCopyTo(), I always get SPERR_UNINITIALIZED error code.

Do I need to perform any additional steps before the conversion can proceed? I could not find any examples of using the ISpStreamFormatConverter interface.

UPDATE. This is the code (Delphi) that attempst to use ISpStreamFormatConverter:

res := CoCreateInstance(CLASS_SpStreamFormatConverter,
   nil, CLSCTX_INPROC_SERVER,
   IID_ISpStreamFormatConverter,
   SpStreamFormatConverter);
if CheckFunction(res, 'CoCreateInstance(CLASS_SpStreamFormatConverter)') then begin
  fFileStream.Position := 0;
  //TSpStreamFormat is my own class that implemaants ISpStreamFormat
  iSourceStream := TSpStreamFormat.Create(fFileStream, fCaptureWaveFormatEx) as ISpStreamFormat;
  res := SpStreamFormatConverter.SetBaseStream(SpeechLib_TLB.ISpStreamFormat(iSourceStream), 0, 0);
  if CheckFunction(res, 'ISpStreamFormatConverter.SetBaseStream)') then begin
    res := SpStreamFormatConverter.ResetSeekPosition;
    if CheckFunction(res, 'ISpStreamFormatConverter.ResetSeekPosition)') then begin
      res := cpRecognizer.SetInput(SpStreamFormatConverter, 1);
      if CheckFunction(res, 'ISpRecognizer.SetInput') then begin
        res := cpRecognizer.SetRecoState(SPRST_ACTIVE_ALWAYS);
      end;
    end;
  end;
end;

Solution

  • It looks like you're missing one step - after calling

    SpStreamFormatConverter.SetBaseStream(SpeechLib_TLB.ISpStreamFormat(iSourceStream), 0, 0);
    

    you need to call SetFormat to define the output format:

    SpStreamFormatConverter.SetFormat(SPDFID_WaveFormatEx,pConvertedWaveFormatEx);
    

    (I'm not familiar with Delphi, so it's likely you'll have to tweak this somewhat to compile.)