I am working with images on Vivado HLS 2015.4.
I am getting a very high latency of around 311774 clock cycles. Even though program just takes two input images and convert it to gray from RGB. Overall latency is 311774 as I am getting 77-78k latency for all three Axi2Mat
, RGB2GRAY
and Mat2AXI
.
Is there any way to reduce it so that I can pipeline it to make final latency as around 78k?
I am attaching my code and synthesis report:
#include <hls_video.h>
#include <hls/hls_video_types.h>
#include "top.h"
void toGray(AXI_IN_STREAM &IN_STREAM_1, AXI_IN_STREAM &IN_STREAM_2, AXI_OUT_STREAM &OUT_STREAM_1, AXI_OUT_STREAM &OUT_STREAM_2, unsigned int cols, unsigned int rows){
#pragma HLS INTERFACE axis port=IN_STREAM_1
#pragma HLS INTERFACE axis port=OUT_STREAM_1
#pragma HLS INTERFACE axis port=IN_STREAM_2
#pragma HLS INTERFACE axis port=OUT_STREAM_2
#pragma HLS RESOURCE core=AXI_SLAVE variable=rows metadata="-bus_bundle CONTROL"
#pragma HLS RESOURCE core=AXI_SLAVE variable=cols metadata="-bus_bundle CONTROL"
#pragma HLS RESOURCE core=AXI_SLAVE variable=return metadata="-bus_bundle CONTROL"
#pragma HLS INTERFACE ap_stable port=rows
#pragma HLS INTERFACE ap_stable port=cols
hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_1(rows, cols);
hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_2(rows, cols);
hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_1(rows, cols);
hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_2(rows, cols);
// hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> outMat(rows, cols);
hls::AXIvideo2Mat(IN_STREAM_1, inMat_1);
hls::AXIvideo2Mat(IN_STREAM_2, inMat_2);
hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_1, grayMat_1);
hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_2, grayMat_2);
// hls::EqualizeHist(grayMat, outMat );
hls::Mat2AXIvideo(grayMat_1, OUT_STREAM_1);
hls::Mat2AXIvideo(grayMat_2, OUT_STREAM_2);
}
UG902: Vivado Design Suite User Guide P. 293: Since the functions are already pipelined, adding the DATAFLOW optimization ensures the pipelined functions will execute in parallel.
So just adding the #pragma HLS dataflow
directive to your code should ensure that you are processing one sample per clock with dataflow between the functions. As a result the latency should reduce to 77-78k (which I am assuming is cols*rows
).