I've been wondering if there is any standard in C++, that allows you to expand static expression, based on the arrays or some other predefined sets of data, to N rows of code with their internal data.
Example: if there is a need to sum up 5 adjacent matrix cells I would write a function like this:
int sumPlus(const vector<vector<int>>& matrix, int atX, int atY) {
static const pair<int,int> positions[5] = {{0,0},{0,1},{0,1},{-1,0},{0,-1}};
int sum = 0;
for(int i = 0;i < 5;++i) {
sum += matrix[atY + positions[i].first][atX + positions[i].second];
}
return sum;
}
but as I know, it will be translated to code with jumps (jmp or jle instructions) which is slow code. So the question itself: is there any standard that will provide the syntax, like this:
int sumPlus(const vector<vector<int>>& matrix, int atX, int atY) {
expression_array pair<int,int> positions[5] = {{0,0},{0,1},{0,1},{-1,0},{0,-1}};
int sum = 0;
expression<positions : position>(sum += matrix[atY + position.first][atX + position.second]);
return sum;
}
which will be expanded later into this by the compiler:
int sumPlus(const vector<vector<int>>& matrix, int atX, int atY) {
int sum = 0;
sum += matrix[atY + 0][atX + 0];
sum += matrix[atY + 1][atX + 0];
sum += matrix[atY + (-1)][atX + 0];
sum += matrix[atY + 0][atX + 1];
sum += matrix[atY + 0][atX + (-1)];
return sum;
}
Both versions produce identical assembly (when made equivalent), see https://godbolt.org/z/7Kf59r6o8
The optimization that you're missing is called "loop unrolling". The compiler already fully or partially unrolls loops, meaning that code such as
for(int i = 0; i < 5; ++i) { /* do stuff */; }
... is automatically translated into
/* do stuff where i = 0 */;
/* do stuff where i = 1 */;
/* do stuff where i = 2 */;
/* do stuff where i = 3 */;
/* do stuff where i = 4 */;
Of course, it only happens if the compiler deems this worth it, which depends on heuristics. Clang is somewhat more aggressive in loop unrolling than GCC. Just let the compiler figure it out, or, if you have done profiling and can know that the compiler makes a bad decision, you can improve it by forcing loop unrolling:
#pragma clang loop unroll_count(2)
Personally, I would have written
// I have faith that this loop gets unrolled.
for (int i = -1; i <= 1; ++i) {
// I have faith that this loop gets unrolled too.
for (int j = -1; j <= 1; ++j) {
// I have faith that this if statement is optimized out.
if (i == 0 || j == 0) {
sum += matrix[atY + i][atX + j];
}
}
}
expression<positions : position>(sum += matrix[atY + position.first][atX + > position.second]);
To get something along these lines, you could create some monstrosity made of std::index_sequence
and fold expressions, but it's not worth it here.
In the end, it would look something like
int sum(const vector<vector<int>>& matrix, int atX, int atY) {
static constexpr int positions[5][2] = {{0,0},{0,1},{1,0},{-1,0},{0,-1}};
return [atX, atY]<std::size_t... I>(std::index_sequence<I...>) {
return (matrix[atY + position[I][0]][atX + position[I][1]] + ... + 0);
}(std::make_index_sequence<I...>{});
}
Anyhow, don't do this. It's a purely academic code sample.