I have a big function, which for the MWE below is called modify_vec
. In it, there are for loops where an operation is performed many times (e.g., for each element of a matrix).
I would like the function to be capable of applying different operations, depending on different given inputs, without using if-else statements within the loop.
From my (limited) understanding of compiler optimizations using SIMD, I would have to: either manually write all possible combinations of the operations for the function, so that the compiler knows at compile-time which operation to optimize for; or using templates somehow.
But I do not know how to perform this using templates. Instead, I've tried with std::map<const int, const std::function>
.
Below are some code snippets and a MWE, which can also be seen in Compiler Explorer: https://godbolt.org/z/z777aq5ab.
Bonus question: since none of my actual vectors are defined at compile-time, does "constexpr" even help?
A version of modify_vec
that only works for a specific order of operations, where p
is an argument common to all possible operations.
void modify_vec(std::vector<double> &inp_vec,
const std::vector<double> &p) {
const auto n = inp_vec.size();
for (auto i=0; i<n; ++i) {
inp_vec[i] = inp_vec[i]*p[0]+3.; // this is operation 1 in the MWE
}
for (auto i=0; i<n; ++i) {
inp_vec[i] = inp_vec[i]/p[1]-4.; // this is operation 2 in the MWE
}
}
My attempt at generalizing it, with only two distinct operations, although more may be used in my actual problem:
// Are these functions inlined with GCC+11?
constexpr double first_op(const double input, const double p) {
return input*p+3.;
}
constexpr double second_op(const double input, const double p) {
return input/p-4.;
}
std::map<const int,
const std::function<double(const double, const double)>> op {
{0, first_op},
{1, second_op}
};
void modify_vec_gen(std::vector<double> &inp_vec,
const std::vector<double> &p,
const int op1, const int op2) {
const auto n = inp_vec.size();
for (auto i=0; i<n; ++i) {
inp_vec[i] = op[op1](inp_vec[i], p[0]); // Can SIMD be used here?
}
for (auto i=0; i<n; ++i) {
inp_vec[i] = op[op2](inp_vec[i], p[1]);
}
}
My main file, main.cpp
, contains the above code snippets and the following, compiled with g++ -std=c++11 -O3 -ffast-math -mavx2 main.cpp -o main
.
#include <iostream>
#include <vector>
#include <functional>
#include <map>
/* Copy and paste the above snippets here */
template <typename T>
constexpr void print_vec(const std::vector<T> &vec) {
for (const auto v : vec) {
std::cout << v << " ";
}
std::cout<<std::endl;
}
int main(int argc, char** argv) {
int op1 = std::stoi(argv[1]);
int op2 = std::stoi(argv[2]);
auto vec1 = std::vector<double>{2., 5., -3., 10.};
const auto p = std::vector<double>{3., 2.};
auto vec2{vec1};
auto vec3{vec1};
std::cout << "Initial vector:" << std::endl;
print_vec(vec1);
std::cout << "After operations 0 and 1:" << std::endl;
modify_vec(vec1, p);
print_vec(vec1);
std::cout << "After operations " << op1 << " and " << op2 << ":" << std::endl;
modify_vec_gen(vec2, p, op1, op2);
print_vec(vec2);
std::cout << "After operations " << op1 << " and " << op1 << ":" << std::endl;
modify_vec_gen(vec3, p, op1, op1);
print_vec(vec3);
return 0;
}
Running with ./main 0 1
yields:
Initial vector:
2 5 -3 10
After operations 0 and 1:
0.5 5 -7 12.5
After operations 0 and 1:
0.5 5 -7 12.5
After operations 0 and 0:
21 39 -9 69
A possible, based on jwezorek's answer is shown below. In Compiler Explorer: https://godbolt.org/z/K3KefzTYW.
Compiled using -O3 -ffast-math -mavx2 -std=c++11 -fopt-info-vec-all -Wall -pedantic -Werror
in gcc, -O3 -ffast-math -mavx2 -std=c++11 -Rpass-analysis=loop-vectorize -Wall -pedantic -Werror
in clang, and /O2 /FA /arch:AVX2 /std:c11 /Qvec-report:2
in MSVC.
But MSVC is not able to vectorize the code, only GCC is.
The helper function call_modify_vec
is pretty ugly, but, since I will only use two distinct operations, the first having 2 possibilities and the other 4, I guess it's good enough.
#include <iostream>
#include <vector>
#ifdef _WIN32
#define MYRESTRICT __restrict
#else
#define MYRESTRICT __restrict__
#endif
constexpr double first_op(const double input, const double p) {
return input*p+3.;
}
constexpr double second_op(const double input, const double p) {
return input/p-4.;
}
using op_fn = double(*)(const double input, const double p);
const op_fn op_vec[] = {first_op, second_op};
template<int OP1, int OP2>
void modify_vec_template(double * MYRESTRICT inp_vec,
const std::size_t vec_size,
const std::vector<double> &p) {
std::size_t i;
for (i=0; i<vec_size; ++i) {
inp_vec[i] = op_vec[OP1](inp_vec[i], p[0]);
}
for (i=0; i<vec_size; ++i) {
inp_vec[i] = op_vec[OP2](inp_vec[i], p[1]);
}
}
void call_modify_vec(std::vector<double> &in_vec,
const std::vector<double> &p,
const int op1, const int op2) {
if ( (op1 == 0) && (op2 == 0)) {
modify_vec_template<0, 0>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 0) && (op2 == 1)) {
modify_vec_template<0, 1>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 1) && (op2 == 0)) {
modify_vec_template<1, 0>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 1) && (op2 == 1)) {
modify_vec_template<1, 1>(in_vec.data(), in_vec.size(), p);
}
}
I have found a way to code this that works both for GCC and MSVC. It requires OpenMP for MSVC, though. GCC can do without OpenMP.
The idea is to make a constexpr
function containing all possible smaller operations with if
-else
statements. The conditionals are evaluated using a non-type template parameter (could also be an enum, I think).
Then template the do_sth
function with all distinct possible operations and use #pragma omp simd
on the for loops. Each possible combination of operations has to be called in a helper/caller function call_do_sth
.
This obviously doesn't scale to a large amount of different combinations, but it works if you keep it small (see the discussions in this Stack Overflow post about runtime specification of template parameters).
This can also be seen in Compiler Explorer.
The flags for GCC are: -O3 -ffast-math -mavx2 -fopenmp fopt-info-vec-all
And the flags for MSVC are: /O2 /fp:fast /arch:AVX2 /openmp:experimental /Qvec-report:2
#ifdef _WIN32
#define MYRESTRICT __restrict
#else
#define MYRESTRICT __restrict__
#endif
template<int op>
constexpr double funs(const double val1, const double val2) noexcept {
if (op == 0) { return val1*val2-10.; }
else if (op == 1) { return val1/val2+4.; }
}
template<int op1, int op2>
void do_sth(double * MYRESTRICT arr, const std::size_t size,
const double p) {
std::size_t i;
#pragma omp simd
for (i = 0; i < size; ++i) {
arr[i] = funs<op1>(arr[i], p);
}
#pragma omp simd
for (i = 0; i < size; ++i) {
arr[i] = funs<op2>(arr[i], p);
}
}
void call_do_sth(double *arr, const std::size_t size,
const double p, const int op1, const int op2) {
if ((op1 == 0) && (op2 == 0)) {
do_sth<0, 0>(arr, size, p);
} else if ((op1 == 0) && (op2 == 1)) {
do_sth<0, 1>(arr, size, p);
} else if ((op1 == 1) && (op2 == 0)) {
do_sth<1, 0>(arr, size, p);
} else if ((op1 == 1) && (op2 == 1)) {
do_sth<1, 1>(arr, size, p);
}
}