I'm writing a small tensor library (inspired from here: http://www.wlandry.net/Projects/FTensor), relying on C++11 or 14.
For the sake of clarity, I would like to access the indices starting from 1 ,and not 0. For instance, for a vector v, use v(1), v(2) and v(3) to access its components, stored in a regular 1D array. (edit: This is for a continuum mechanics application. People are used to start at 1. Yes, it is un-natural for programmers.)
Now, I already now how to do it at run time. I would like to know if we can avoid this computation at runtime, as it get more complex for higher order tensors in case of symmetries.
I found this way of writing a compile time map where i manually put the shift: https://stackoverflow.com/a/16491172/1771099
However, using the code in the SO answer above, I cannot do something like this:
for i=1; i<4;i++){
std::cout << mymap::get<i>::val << std::endl;
}
since i is not known a compile time. Is there a smart way to do it ? Build a kind of static map, for each tensor order and symmetry, than we can lookup at runtime ?
Thanks
Edit: Here is what I consider "bad". The constexpr are actually useless
// subtract 1 to an index
inline constexpr size_t
sub_1_to_idx(size_t const &i) {
return i - 1;
}
// build a parameter pack from (a,b,c,...) to (a-1,b-1,c-1,...)
template<typename... idx>
constexpr std::array<size_t, sizeof...(idx)>
shift_idx(const idx &... indices) {
return std::array<size_t, sizeof...(idx)>
{{sub_1_to_idx(indices)...}};
};
//use this to return an index of a 1D array where the values of a possibly multidimentional, possibly symmetric Tensor are stored.
template<>
template<typename... idx>
size_t
TA_Index<2, Tsym::IJ>::of(const idx&... indices) {
static_assert(sizeof...(indices) == 2, "*** ERROR *** Please access second order tensor with 2 indices");
const std::array<size_t, sizeof...(idx)> id = shift_idx(indices...);
return id[0] > id[1] ? (id[0] + (id[1] * (2 * 3 - id[1] - 1)) / 2)
: (id[1] + (id[0] * (2 * 3 - id[0] - 1)) / 2);
There isn't much you can change about this. Yes, you could create a dense compile-time lookup table, but would you access it one-based (requires padding, i.e. space overhead) or zero-based (requires re-indexing, i.e. time overhead)? And would it actually buy you any meaningful performance?
Have you compared the assembly resulting from one-based vs zero-based indices?
They are virtually identical, save for two lea
(to, well, subtract 1 from both input indices) and an additional cmova
(no idea what's going on there). If the zero-based code was good enough, the one-based one will be equally fine.
Also, in either case, the compiler perfectly constant-folded the indices that were known at compile-time. That's bread-and-butter optimization for a compiler, so whether or not your functions are constexpr
or not is going to be insubstantial when indices are known at compile-time.
And finally, for operations on higher-dimensional/larger tensors, the semi-random access pattern incurred from the index calculations will likely be a much bigger performance bottleneck (due to not using the cache efficiently) than the index calculations themselves.