Search code examples
imagerustframeh.264mpeg

Extract raw I frame image data from MPEG-2 Transport Stream (H.264 - Annex B) byte stream


Context

I'm attempting to extract raw image data for each I-frame from an MPEG-2 Transport Stream with a H.264 annex B codec. This video contains I-frames on every 2 second interval. I've read that an I-frame can be found in after an NALu start code with a type of 5 (e.g. Coded slice of an IDR picture). The byte payload of these NALu's contains all the necessary data to construct a full frame. Albeit to, my understanding, in a H.264 encoded format.

I would like to build a solution to extract these I-frame from an incoming byte stream, by finding NALu's that contain I-frames, saving the payload and decoding the payload into some ubiquitous raw image format to access pixel data etc.

Note: I would like to avoid using filesystem dependency binaries like ffmpeg if possible and more importantly if feasible!

PoC

So far I have build an PoC in rust to find the byte offset, and byte size of I-frames:

use std::fs::File;
use std::io::{prelude::*, BufReader};
extern crate image;

fn main() {
    let file = File::open("vodpart-0.ts").unwrap();
    let reader = BufReader::new(file);

    let mut idr_payload = Vec::<u8>::new();
    let mut total_idr_frame_count = 0;
    let mut is_idr_payload = false;
    let mut is_nalu_type_code = false;
    let mut start_code_vec = Vec::<u8>::new();

    for (pos, byte_result) in reader.bytes().enumerate() {
        let byte = byte_result.unwrap();
        if is_nalu_type_code {
            is_idr_payload = false;
            is_nalu_type_code = false;
            start_code_vec.clear();
            if byte == 101 {
                is_idr_payload = true;
                total_idr_frame_count += 1;
                println!("Found IDR picture at byte offset {}", pos);
            }
            continue;
        }
        if is_idr_payload {
            idr_payload.push(byte);
        }
        if byte == 0 {
            start_code_vec.push(byte);
            continue;
        }
        if byte == 1 && start_code_vec.len() >= 2 {
            if is_idr_payload {
                let payload = idr_payload.len() - start_code_vec.len() + 1;
                println!("Previous NALu payload is {} bytes long\n", payload);
                save_image(&idr_payload.as_slice(), total_idr_frame_count);
                idr_payload.clear();
            }
            is_nalu_type_code = true;
            continue;
        }
        start_code_vec.clear();
    }

    println!();
    println!("total i frame count: {}", total_idr_frame_count);

    println!();
    println!("done!");
}

fn save_image(buffer: &[u8], index: u16) {
    let image_name = format!("image-{}.jpg", index);
    image::save_buffer(image_name, buffer, 858, 480, image::ColorType::Rgb8).unwrap()
}

The result of which looks like:

Found IDR picture at byte offset 870
Previous NALu payload is 202929 bytes long

Found IDR picture at byte offset 1699826
Previous NALu payload is 185069 bytes long

Found IDR picture at byte offset 3268686
Previous NALu payload is 145218 bytes long

Found IDR picture at byte offset 4898270
Previous NALu payload is 106114 bytes long

Found IDR picture at byte offset 6482358
Previous NALu payload is 185638 bytes long


total i frame count: 5

done!

This is correct, based on my research using H.264 bit stream viewers etc. there are definitely 5 I-frames at those byte offsets!

The issue is that I don't understand how to convert from the H.264 bytestream payload to the raw image RBG data format. The resulting images once converted to jpg are just a fuzzy mess that takes up roughly 10% of the image area.

For example:

Ouput jpg image

Questions

  1. Is there a decoding step that needs to be performed?
  2. Am I approaching this correctly and is this feasible to attempt myself, or should I be relying on another lib?

Any help would be greatly appreciated!


Solution

  • “ Is there a decoding step that needs to be performed?”

    Yes. And writing a decoder from scratch is EXTREMELY complicated. The document that describes it (ISO 14496-10) is over 750 pages long. You should use a library. Libavcodec from the ffmpeg is really your only option. (Unless you only need baseline profile, in which you can use the open source decoder from android)

    You can compile a custom version of libavcodec to exclude things you don’t need.