Search code examples
node.jsogggoogle-speech-to-text-api

Creating an Ogg packet from Opus buffers in nodejs


I've been pretty stuck on this problem for a few days now, praying that someone will be able to point me in the right direction.

I have a stream of Opus buffers as encoded by https://github.com/discordjs/opus

I want to send these to the google speech to text api which require them to be encapsulated in ogg containers: https://cloud.google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1#audioencoding

I'm trying to use this library: https://github.com/TooTallNate/node-ogg

Here is what I'm trying:

const oggEncoder = ogg.Encoder();
const oggStream = oggEncoder.stream();

const audioInputStreamTransform = new Writable({
  write(frame, encoding, next) {
    if (frame) {
        oggStream.write(frame);
      }
    }
    next();
  },
});

voiceStream.pipe(audioInputStreamTransform)
oggEncoder.pipe(google-speech2textStream)

// Neither of these work - nothing appears to be happening
// No data events emitted from either stream
// oggEncoderStream.pipe(google-speech2textStream)

I've also tried to use the ogg-packet library to wrap my buffer in the ogg_packet struct before sending off to oggStream.write. This also leads to no data events being emitted. I'm pretty sure this is the wrong approach given ogg-packet says:

You'll most likely not need to use this module for any practical purposes

but I thought I would try anyway.

What I've tried

          const packet = new ogg_packet();
          packet.packet = frame;
          packet.bytes = frame.length;

          // this will be the first packet in the ogg stream
          packet.b_o_s = 1;
          // there will be more `ogg_packet`s after this one in the ogg stream
          packet.e_o_s = 0;

          // the "packetno" should increment by one for each packet in the ogg stream
          packet.packetno = packetno++;

          // No joy with any of these
          //oggStream.write(ogg.ogg_packet(packet));
          //oggStream.write(packet);
          //oggStream.write(packet.buffer);

I'm a real novice when it comes to audio encoding so I'm probably misunderstanding some part of this process - apologies if it's something trivial but I've been doing this for about a week now 😅

If there's a better place to ask for help please feel free to move me along - thanks :)

Also tried something like this example from node-opus with no luck


OK so further digging:

I downloaded node-opus which is documented to work with node-ogg

I have noticed the result of Encoder.encode is not the same between node-opus and @discordjs/opus. It seems node-opus spits out what I believe to be an ogg_packet, and discordjs/opus gives a buffer.

ie.: opus stream -> discord/opus.decode -> node-opus.encode -> log:

{ packet: <Buffer 4f>,
  bytes: 19,
  b_o_s: 1,
  e_o_s: 0,
  granulepos: -1,
  packetno: 0,
  'ref.buffer':
   <Buffer 18 33 11 04 01 00 00 00 13 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00> }

compared to opus stream -> discord/opus.decode -> discord/opus.encode -> log:

<Buffer 78 80 64 26 7e d0 2f e8 f5 a5 6d 1c da 41 04 0b 33 d9 ee 3a 0b ee 53 a6 f6 bb cf 55 c8 e3 36 e1 18 4a 9f e9 7f 94 8d a3 0c 96 b3 a1 f7 03 e7 9a 78 db ... >

So that will be my problem. I need to create ogg packets from these buffers is my guess...

I'm curious though why these two opus encoding libraries are so different though unless I've royally messed something up


Solution

  • As above the answer was that the ogg package is expecting ogg_packets and @discordjs/opus does not give that.