Search code examples
c++audiowebmopus

Non-audible videos with libwebm (VP8/Opus) -- Syncing audio --


I am trying to create a very simple webm(vp8/opus) encoder, however I can not get the audio to work.

ffprobe does detect the file format and duration

Stream #1:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

VLC Media information dialog

The video can be played just fine in VLC and Chrome, but with no audio, for some reason the audio input bitrate is always 0

Most of the audio encoding code was copied from https://github.com/fnordware/AdobeWebM/blob/master/src/premiere/WebM_Premiere_Export.cpp

Here is the relevant code:

static const long long kTimeScale = 1000000000LL;

MkvWriter writer;
writer.Open("video.webm");

Segment mux_seg;
mux_seg.Init(&writer);

// VPX encoding...

int16_t pcm[SAMPLES];
uint64_t audio_track_id = mux_seg.AddAudioTrack(SAMPLE_RATE, 1, 0);
mkvmuxer::AudioTrack *audioTrack = (mkvmuxer::AudioTrack*)mux_seg.GetTrackByNumber(audio_track_id);
audioTrack->set_codec_id(mkvmuxer::Tracks::kOpusCodecId);
audioTrack->set_seek_pre_roll(80000000);
OpusEncoder *encoder = opus_encoder_create(SAMPLE_RATE, 1, OPUS_APPLICATION_AUDIO, NULL);
opus_encoder_ctl(encoder, OPUS_SET_BITRATE(64000));
opus_int32 skip = 0;
opus_encoder_ctl(encoder, OPUS_GET_LOOKAHEAD(&skip));
audioTrack->set_codec_delay(skip * kTimeScale / SAMPLE_RATE);
mux_seg.CuesTrack(audio_track_id);
uint64_t currentAudioSample = 0;
uint64_t opus_ts = 0;
while(has_frame) {
  int bytes = opus_encode(encoder, pcm, SAMPLES, out, SAMPLES * 8);
  opus_ts = currentAudioSample * kTimeScale / SAMPLE_RATE;
  mux_seg.AddFrame(out, bytes, audio_track_id, opus_ts, true);
  currentAudioSample += SAMPLES;
}

opus_encoder_destroy(encoder);
mux_seg.Finalize();
writer.Close();

Update #1: It seems that the problem is that WebM requires the audio and video tracks to be interlaced. However I can not figure out how to sync the audio. Should I calculate the frame duration, then encode the equivalent audio samples?


Solution

  • The problem was that I was missing the OGG header data, and the audio frames timestamps were not accurate.

    to complete the answer here is the pseudo code for the encoder.

    const int kTicksPerSecond = 1000000000; // webm timescale
    const int kTimeScale = kTicksPerSecond / FPS;
    const int kTwoNanoSeconds = 1000000000;
    
    init_opus_encoder();
    audioTrack->set_seek_pre_roll(80000000);
    audioTrack->set_codec_delay(opus_preskip);
    audioTrack->SetCodecPrivate(ogg_header_data, ogg_header_size);
    
    while(has_video_frame) {
      encode_vpx_frame();
      video_pts = frame_index * kTimeScale;
      muxer_segment.addFrame(frame_packet_data, packet_length, video_track_id, video_pts, packet_flags);
      // fill the video frames gap with OPUS audio samples
      while(audio_pts < video_pts + kTimeScale) {
        encode_opus_frame();
        muxer_segment.addFrame(opus_frame_data, opus_frame_data_length, audio_track_id, audio_pts, true /* keyframe */);
        audio_pts = curr_audio_samples * kTwoNanoSeconds / 48000;
        curr_audio_samples += 960;
      }
    }