Search code examples
matlabparallel-processingh.264matlab-deploymentvideo-compression

Parallel Computing for video compression in MATLAB


I need some help with parallel programming in MATLAB. To be clear, I have never implemented parallelization techniques in any of my codes before. I have a video compression engine, developed as part of my university project. It is a basic verion of H.264 video compression engine. I have to implement the parallel proceesing techniques available in MATALB to this engine. Basically, I have a function which divides an image frame into a number of blocks (predtermined by the size of the block). I'm trying to partially or fully parallelize this block of the code. I have used "parfor" when there was no dependency between the blocks, and this worked out well. I have uploaded this implementation. Now I'm trying to parallalize a case were there are dependencies between blocks.

function [reconstructed_frames, residual_blocks, encoded_data_cell, bit_count_coeff_per_frame, bit_count_mv_per_frame_cell, real_avg_bit_count_per_row_per_frame, total_bit_count_per_frame, QP_used_in_row, scene_change_frames, SAD_value_per_frame] = block_prediction_parallalized(Y, block_size, srch_rng, QP, I_period,pathToResiduals, no_ref_frames, VBS_enable, Fast_ME_enable,Frac_ME_enable,lambda, RC_flag, avg_bit_count_row_vary_QP, target_bits_per_frame)
%Function to predcit frames based on inter prediction and intra prediction,
%with the given I-period
Y = int64(Y);
[no_rows, no_cols, no_frames] = size(Y);
no_blocks_in_row = (no_cols*block_size)/(block_size*block_size);
no_blocks_in_col = (no_rows*block_size)/(block_size*block_size);
total_blocks_per_frame = (no_rows*no_cols)/(block_size*block_size);
encoded_data_cell = cell(1,total_blocks_per_frame,no_frames);
encoded_data_per_frame = cell(1, total_blocks_per_frame);
ref_frame_inter = zeros(no_rows, no_cols, 1, 'int64') + 128;
bit_count_coeff_per_frame = 0;
bit_count_mv_per_frame_cell = 0;
real_avg_bit_count_per_row_per_frame = 0;
QP_used_in_row = zeros(1,no_blocks_in_col,no_frames);
QP_used_in_row(:,:,:) = QP;
scene_change_frames = [];
SAD_value_per_frame = 0;
ref_frame_index_count = 1;
for k = 1:no_frames
    if k>1
        ref_frame_inter(:,:,1) = Y(:,:,k-1);
    end
    block_segment = 0;
    bitCountMV = 0;
     for row = 1 : block_size : no_rows - block_size + 1
         for col = 1 : block_size : no_cols - block_size + 1
            block_segment = block_segment + 1;
            row_start = row;
            row_end = row_start + block_size - 1;
            col_start = col;
            col_end = col_start + block_size - 1;
            row_end = min(row_end, no_rows);
            col_end = min(col_end, no_cols);
        
            % Making an array of blocks of size block_size
            block_list_currframe(:,:,block_segment) = Y(row_start:row_end, col_start:col_end, k);
            location_pointers(block_segment,:) = [row_start row_end col_start col_end];           
         end         
     end
     %Parallelizing the block encoding process
     max_index = size(block_list_currframe,3);
     %Loop for processing blocks concurrently
     parfor block_index = 1:max_index
        % Funtion for inter-prediction
        [encoded_data, reconstructed_block, residual_block, bit_count_per_block] = paral_debug_funct(block_index, location_pointers, block_list_currframe, ref_frame_inter, block_size, srch_rng, QP, no_rows, no_cols, ref_frame_index_count, VBS_enable, Fast_ME_enable, Frac_ME_enable, lambda);
        
        %Buffering the output of each worker
        reconstructed_blocks(:,:,block_index) = reconstructed_block;
        residual_blocks_in_frame(:,:,block_index) = residual_block;
        encoded_data_per_frame(:,:, block_index) = encoded_data;
        total_bit_count_per_block(block_index) = bit_count_per_block;
     end
     
     %Processing the buffered outputs obtained after processing all the
     %blocks.
     for block_index = 1:size(block_list_currframe,3)
%          [row_start, row_end, col_start, col_end] = location_pointers(block_index,:);
        row_start = location_pointers(block_index, 1);
        row_end = location_pointers(block_index, 2);
        col_start = location_pointers(block_index, 3);
        col_end = location_pointers(block_index, 4);
        reconstructed_frames(row_start:row_end, col_start:col_end, k) = reconstructed_blocks(:,:,block_index);
        residual_blocks(:,:,block_index,k) =  residual_blocks_in_frame(:,:,block_index);
        encoded_data_cell(:,:,block_index,k) = encoded_data_per_frame(:,:,block_index);
     end
     total_bit_count_per_frame(k) = sum(total_bit_count_per_block, 'all');
end

In the above code, the blocks dont have to communicate with each other. Now, I require them to communicate with each other at some point, as the processing of some blocms will have to wait for a previous block to finish. I think the image below will help make it clearer. Block dependencies

I have come to know that there are two type of parallel processing available, multi-threading and multi-processing. I think multi-threading is what is apt for my use case. I have read about spmd and parfeval but, the examples I've come across are usually not very detailed. As I am new to parallel processing, these options feel very confusing and it is difficult to choose which one to focus on. I think what I want is that the workers to be able to communicate with each other during exection?, I'm not sure. If you need a general idea of the data size: video_frame size = 288x352(CIF format) block size = 16 no of frames = 21

Thank you!

P.S Sorry for the long post, I was trying to explain it as clearly as possible


Solution

  • You can use a parfor inside a non parallel for, something like this:

    previous_blocks = {};
    for color : ["green", "red", "blue"]
      input_blocks = extract cell array of blocks with same color from the image
      processed_blocks = cell(1, numel(input_blocks));
      parfor i=1:numel(input_blocks)
        processed_blocks{i} = process_based_on_previous_blocks (i, input_blocks{i}, previous_blocks);
      end
      previous_blocks = processed_blocks;
      place processed_blocks in their original position in the image;
    end