Search code examples

How do I serialize video frames for streaming over UDP?

I am trying out video streaming over UDP. I capture my screen using vidgear and use pickle for serialization. I am trying to build a remote desktop solution therefore requiring low-latency, but I found that pickle is slow for the purpose. Are there any other serialization frameworks that can serialize video frames? I was able to find flatbuffers and protobuf but I am not sure how to use these for video.

So it would be greatly appreciated if someone could point me in the right direction, i.e suggest a fast serialization framework.

Thanks in advance! :)


  • I see you already resolved it but meanwhile I made some example.

    You can use tobytes() to convert numpy.arraay to bytes which you can send by socket.

        byte_data = arr.tobytes()

    You can also use struct to convert values its length to 4 bytes or height,width,depth to 12 bytes

        size = len(byte_data)
        byte_size = struct.pack('I', size)
        width, height, depth = arr.shape
        byte_width_height = struct.pack('III', width, height, depth)

    And then you can send with size or with width, height, depth

        all_bytes = byte_size + byte_data
        all_bytes = byte_height_width + byte_data

    In client you can first get 4 bytes with size

    byte_size = recv(4)
    size = struct.unpack('I', byte_size)

    or 12 byes if you send it with height,width,depth

    byte_height_width_depth = recv(12)
    height, width, depth = struct.unpack('III', byte_height_width_depth)

    and then you know how may bytes has frame

    byte_data = recv(size)
    arr = np.frombuffer(byte_data, dtype=np.uint8)

    with height,width,depth you may know also how to reshape it

    byte_data = recv(height*width*depth)
    arr = np.frombuffer(byte_data, dtype=np.uint8)
    arr = arr.reshape((height, width, depth))

    If you use frame always with the same height, width, depth then you could send only data without height, width, depth or even without `size and use hardcoded values in code.

    But if you plan to send it as compressed to JPG or PNG which may have different number of bytes then you will need to send size as first value.

    Using pickle you get more bytes because it send information about class numpy.array to reconstruct it.

    Using tobytes you have to reconstruct array on your own.

    Example code - it simulate to send, recv.

    import numpy as np
    import struct
    import pickle
    """Simulater socket."""
    internet = bytes()
    pointer = 0
    def send(data):
        """Simulater socket send."""
        global internet
        internet += data
    def recv(size):
        """Simulater socket recv."""
        global pointer
        data = internet[pointer:pointer+size]    
        pointer += size
        return data
    def send_frame(arr):
        #height, width, depth = arr.shape
        #byte_height_width_depht = struct.pack('III', width, height, depth)
        byte_height_width_depht = struct.pack('III', *arr.shape)
        byte_data = arr.tobytes()
        all_bytes = byte_height_width_depht + byte_data
        print('all_bytes size:', len(all_bytes))
        print('all_bytes data:', all_bytes)
    def recv_frame():
        byte_height_width_depht = recv(12)
        height, width, depth = struct.unpack('III', byte_height_width_depht)
        byte_data = recv(height*width*depth)
        arr = np.frombuffer(byte_data, dtype=np.uint8).reshape((height, width, depth))
        return arr
    # --- main ---    
    arr = np.array([
            [[255, 255, 255], [255, 255, 255]],
            [[255,   0,   0], [  0,   0, 255]],
            [[255,   0,   0], [  0,   0, 255]],
            [[255, 255, 255], [255, 255, 255]],
    ], dtype=np.uint8)
    print('--- pickle ---')
    data = pickle.dumps(arr) 
    print('pickle size:', len(data))
    print('pickle data:')
    arr = pickle.loads(data)
    print('--- send frame ---')
    print('--- recv frame ---')
    arr = recv_frame()