The question is quite straightforward. I'll also explain what I do in case there is a faster way to do this without optimizing this specific way.
I go through an image and its rgb values. I have bins of size 256 for each color. So for every pixel I calculate the 3 bins of its rgb values. The bins essentially give me the index to access data for the specific color in a large vector. With this data, I do some calculations which are irrelevant. What I want to optimize is the accessing part.
Keep in mind that the large vector has an extra dimension. Every pixel belongs to some defined areas of the image. For every area it belongs to, it has an element in the big vector. So, if a pixel belongs in 4 areas(eg 3,9,12,13) then the data I want to access is: data[colorIndex][3],data[colorIndex][9],data[colorIndex][12],data[colorIndex][13]
.
I think that's enough to explain the code which is the following:
//Just filling with data for the sake of the example
int cols = 200; int rows = 200;
cv::Mat image(200, 200, CV_8UC3);
image.setTo(Scalar(100, 100, 100));
int numberOfAreas = 50;
//For every pixel (first dimension) we have a vector<int> containing ones for every area the pixel belongs to.
//For this example, every pixel belongs to every area.
vector<vector<int>> areasThePixelBelongs(200 * 200, vector<int>(numberOfAreas, 1));
int numberOfBins = 32;
int sizeOfBin = 256 / numberOfBins;
vector<vector<float>> data(pow(numberOfBins, 3), vector<float>(numberOfAreas, 1));
//Filling complete
//Part I need to optimize
uchar* matPointer;
for (int y = 0; y < rows; y++) {
matPointer = image.ptr<uchar>(y);
for (int x = 0; x < cols; x++) {
int red = matPointer[x * 3 + 2];
int green = matPointer[x * 3 + 1];
int blue = matPointer[x * 3];
int binNumberRed = red / sizeOfBin;
int binNumberGreen = green / sizeOfBin;
int binNumberBlue = blue / sizeOfBin;
//Instead of a 3d vector where I access the elements like: color[binNumberRed][binNumberGreen][binNumberBlue]
//I use a 1d vector where I just have to calculate the 1d index as follows
int index = binNumberRed * numberOfBins * numberOfBins + binNumberGreen * numberOfBins + binNumberBlue;
vector<int>& areasOfPixel = areasThePixelBelongs[y*cols+x];
int numberOfPixelAreas = areasOfPixel.size();
for (int i = 0; i < numberOfPixelAreas; i++) {
float valueOfInterest = data[index][areasOfPixel[i]];
//Some calculations here...
}
}
}
Would it be better accessing each mat element as a Vec3b? I think I'm essentially accessing an element 3 times for each pixel using uchar. Would accessing one Vec3b be faster?
First of all vector<vector<T>>
is not efficiently stored in memory as it is not contiguous. This as often a big impact on performance and should be avoided as mush as possible (especially when the inner arrays are of the same size). Instead of this, you can use std::array
for fixed-size arrays or a flatten std::vector (with the size dim1 * dim2 * ... dimN
).
Moreover, the loop is a good candidate for parallelization. You can parallelize this code easily with OpenMP. This assumes Some calculations here
can be implemented in a thread-safe way (you should be careful about shared writes if any). If this code is embarrassingly-parallel, then the resulting parallel code can be much faster. Still, using multi-threading introduces some overhead which may be too big compared to the overall computation time (which is highly dependent of the content in Some calculations here
).
Finally, regarding the content in Some calculations here
it may or may not be possible to adapt the code so the compiler use SIMD instructions. The data[index][areasOfPixel[i]]
will likely prevent most compiler to do that, but the following computation could be. Note that software prefetching and gather instructions may help to speed up a bit the data[index][areasOfPixel[i]]
operation.
Note that the way you access pixels should not have a significant impact on the runtime as the computation should be bounded by the speed of the inner loop iterating on areas containing some unknown code (unless this unknown code actually access pixels too).