python file video bit-manipulation mjpeg

Printing a MJPEG frame

I am trying to make a mjpeg streamer. The first 5 bytes preceding a frame gives it size. I can then extract the frame. I need to check if I got the correct frame. I'm trying to write the frame to a .jpeg file; That doesn't work. Am I doing it correctly?

import os
from array import array

class VideoStream:
    def __init__(self,filename):
        self.fis = open(filename,'r')
        self.frame_nb = 0

    def getnextframe(self):
        length = 0
        frame_length = bytearray(5)

        frame_length = self.fis.read(5)
        fm = array('B',frame_length[:5])

        length = fm[4]+((fm[3]<<8)&0xFF)+((fm[2]<<16)&0xFF)+((fm[1]<<24)&0xFF)+((fm[0]<<32)&0xFF)

        frame = self.fis.read(length)
        print 'len=',length

        test = open("test.jpeg",'w')
        test.write(frame)
        test.close()
        print 'frame=',frame



if __name__=='__main__':
    vs = VideoStream("Movie.mjpeg")
    vs.getnextframe()

Solution

The length in your code is effectively just the value of the fifth byte. You shift all other summands at least 8 bits to the left and then mask all but the 8 lowest bits. Those bits are all zero because of the preceding shift operation.

A simple additional test one may implement is if the frame data starts with an JPEG start of image marker (FF D8) and ends with an end of image marker (DD D9).

The following function should be iterating over JPEG images separated by a five ascii character long frame length count, like the VideoStream.java you were porting to Python:

def iter_frames(filename):
    with open(filename, 'rb') as mjpeg_file:
        while True:
            frame_length_bytes = mjpeg_file.read(5)
            if len(frame_length_bytes) != 5:
                if frame_length_bytes:
                    raise ValueError('incomplete length')
                else:
                    break
            frame_length = int(frame_length_bytes)
            frame = mjpeg_file.read(frame_length)
            if len(frame) != frame_length:
                raise ValueError('incomplete frame data')
            if not (
                frame.startswith(b'\xff\xd8') and frame.endswith(b'\xff\xd9')
            ):
                raise ValueError('invalid jpeg')

            yield frame


def main():
    frames = iter_frames('Movie.mjpeg')
    frame = next(frames)
    with open('test.jpg', 'wb') as jpeg_file:
        jpeg_file.write(frame)



if __name__ == '__main__':
    main()

It checks if both, the byte count value and the JPEG data, are complete, and if the JPEG start and end markers are present.

Much simpler than you have thought I guess. But there is a catch: That is a format very likely made up by the author of that Java class.

MJPEG is just a video codec, which is basically just JPEG images concatenated. But it very rarely comes in that ”raw” format but embedded in a container format with meta information like that it's MJPEG data, the frame rate, maybe audio, and so on.

One such format is AVI, as in the example MJPEG avi you've referenced in a comment.

Extracting the frames from such a file into single JPEG images is a bit more work than reading JPEG images prefixed with a simple length information and then concatenated in one file. One needs to implement an AVI reader that understands enough about the AVI format to get at the frame data. Then a JPEG reader that understands enough of the JPEG format to read a full frame, as they are saved back to back without any length information.

The next problem is that not all MJPEGs contain frames which are usable as standalone JPEG images. Some are missing a data table (huffman table) needed for decompressing the image data. There is a fixed table in the AVI specification for the MJPEG codec. That table is used by software for decoding and it has to be injected into the frame when saving as JPEG file.

One last ”thing”: There are interlaced videos which don't contain full images but two consecutive images need to be combined into one. Each image contains every other row. Your given example MJPEG avi is such a video. When extracting the frames without decoding, deinterlacing, and reencoding, each image is only half as high as the video height.

To get a better idea of what the single images look like this ffmpeg command line extracts the frame data and injects the missing data table in order to get standalone JPEG images:

ffmpeg -i bowlerhatdancer.sleepytom.SGP.mjpeg.avi \
   -c:v copy -bsf:v mjpeg2jpeg frame_%04d.jpg