What I want to do is this: I am having an "expanded" array in the first (rows) dimension. For example, I have an image of 1080 rows and 1920 columns. This expanded array is (8*1080) rows and 1920 columns, 8 means "row block" size. What I want to do is to make a new array of size 8x1. This new array will hold the sum of every block at the i-th (i=0 to 7).
In the above example, the first element of the new array (i=0) will be the sum of these pixels in the expanded array (linear indices, column wise):
0, 8(because 8 is the FIRST element of the second block), 16 (third block).....
another example is the second element:
1, 9, 17,...
I think this can be parallelized? I am trying to solve this but I am unable to, I tried gfor but could not find a way to do it, is it not possible with arrayfire? any help appreciated!
I have tried using gfor but I could not solve the problem.
Here is some code that I tried: rx is the 8x1 (p_squared_1 = 8) and rx_all is the expanded (p_squared*rows, columns) array. Note I am using the seq "+" operator because if I try to write "i+p_squared_1" there is ambuiguity, I think...this is a mistake on my part, but I could not find another way to add a value to a seq object).
af::array rx(p_squared_1, 1);
gfor(af::seq i, rows*cols*(p_squared_1-1)) {
rx(i) = af::sum<float>(rx_all(i.operator+( (const int)p_squared_1)));
}
af::eval(rx);
cout << af::sum<float>(rx);
I expect to get a 8x1 array where each i-th element is the sum of the i-th elements of each block in the expanded array.
I think you can achieve this by performing a af::moddims
and a af::sum
.
array img_expanded(1080*8, 1920);
array img_expanded_reshaped = moddims(img_expanded, 8, 1920*1080);
array result = sum(img_expanded_reshaped, 1);
The moddims call reshapes the array into an 8x(1920*1080) array then you perform the summation across the second dimension.
You could get better performance if you treated the 1920 side as the leading dimension. Not only will this match the layout of the image in CPU memory and avoid doing the transpose on transfers to and from the GPU but the reshaped array will have a larger first dimension so it will have better GPU utilization.
array img_expanded(1920, 1080*8);
array img_expanded_reshaped = moddims(img_expanded, 1920*1080, 8);
array result = sum(img_expanded_reshaped, 0);
This will require you to refactor more than this part of the code.