c++image-processing vivado low-latency vivado-hls

Why latency is too high even though it's just RGB to gray conversion (Vivado HLS)?

I am working with images on Vivado HLS 2015.4.

I am getting a very high latency of around 311774 clock cycles. Even though program just takes two input images and convert it to gray from RGB. Overall latency is 311774 as I am getting 77-78k latency for all three Axi2Mat, RGB2GRAY and Mat2AXI.

Is there any way to reduce it so that I can pipeline it to make final latency as around 78k?

I am attaching my code and synthesis report:

#include <hls_video.h>
#include <hls/hls_video_types.h>
#include "top.h"


void toGray(AXI_IN_STREAM &IN_STREAM_1, AXI_IN_STREAM &IN_STREAM_2, AXI_OUT_STREAM &OUT_STREAM_1, AXI_OUT_STREAM &OUT_STREAM_2, unsigned int cols, unsigned int rows){
    #pragma HLS INTERFACE axis port=IN_STREAM_1
    #pragma HLS INTERFACE axis port=OUT_STREAM_1

    #pragma HLS INTERFACE axis port=IN_STREAM_2
    #pragma HLS INTERFACE axis port=OUT_STREAM_2


    #pragma HLS RESOURCE core=AXI_SLAVE variable=rows metadata="-bus_bundle CONTROL"
    #pragma HLS RESOURCE core=AXI_SLAVE variable=cols metadata="-bus_bundle CONTROL"
    #pragma HLS RESOURCE core=AXI_SLAVE variable=return metadata="-bus_bundle CONTROL"

    #pragma HLS INTERFACE ap_stable port=rows
    #pragma HLS INTERFACE ap_stable port=cols

    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_1(rows, cols);
    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_2(rows, cols);

    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_1(rows, cols);
    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_2(rows, cols);


 // hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> outMat(rows, cols);

    hls::AXIvideo2Mat(IN_STREAM_1, inMat_1);
    hls::AXIvideo2Mat(IN_STREAM_2, inMat_2);

    hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_1, grayMat_1);
    hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_2, grayMat_2);
 // hls::EqualizeHist(grayMat, outMat );




    hls::Mat2AXIvideo(grayMat_1, OUT_STREAM_1);
    hls::Mat2AXIvideo(grayMat_2, OUT_STREAM_2);

}

Solution

UG902: Vivado Design Suite User Guide P. 293: Since the functions are already pipelined, adding the DATAFLOW optimization ensures the pipelined functions will execute in parallel.

So just adding the #pragma HLS dataflow directive to your code should ensure that you are processing one sample per clock with dataflow between the functions. As a result the latency should reduce to 77-78k (which I am assuming is cols*rows).