I have a large cv::Mat
with dimensions (100,32768)
. I update it for every frame in a video stream. Before updating, I need to set everything back to zero so I execute
memset(myMat.data,0,100*32768*sizeof(int))
which takes 5ms on average.
Surprisingly(at least to me) in debug mode I get the same times, if not faster ones. While I'd appreciate an explanation as to why this is happening (google gives me loads of reasons so I will eventually figure it out), what I really need is an alternative faster solution. Is there anything I can do?
DDR 4 3k ish caps out at a bit under 100 GB/s DDR 3 800 is 6.4 GB/s.
Your speed is about 12 MB/5ms, or 2.4 GB/s.
So depending on your RAM, you might be near max speed for your hardware. A factor of 2 ain't bad.
You are working on a modestly sparse array. It is possible that a non contiguous buffer might be a better plan, depending on how your data is arranged. Also, GPUs tend to have faster internal memory bandwidth than CPUs, moving your work there could help.
The problem could also be latency; maybe clearing one buffer in another thread while using another would help.
Massively reducing your memory usage, and making it more local, may have a larger impact than you expect. It is plausible your non zeroing code is RAM speed constrained already.