Search code examples
c++tensorcompile-time

C++ access arrays based on 1 and not 0, by shifting at compile time


I'm writing a small tensor library (inspired from here: http://www.wlandry.net/Projects/FTensor), relying on C++11 or 14.

For the sake of clarity, I would like to access the indices starting from 1 ,and not 0. For instance, for a vector v, use v(1), v(2) and v(3) to access its components, stored in a regular 1D array. (edit: This is for a continuum mechanics application. People are used to start at 1. Yes, it is un-natural for programmers.)

Now, I already now how to do it at run time. I would like to know if we can avoid this computation at runtime, as it get more complex for higher order tensors in case of symmetries.

I found this way of writing a compile time map where i manually put the shift: https://stackoverflow.com/a/16491172/1771099

However, using the code in the SO answer above, I cannot do something like this:

for i=1; i<4;i++){
    std::cout << mymap::get<i>::val << std::endl;
}

since i is not known a compile time. Is there a smart way to do it ? Build a kind of static map, for each tensor order and symmetry, than we can lookup at runtime ?

Thanks

Edit: Here is what I consider "bad". The constexpr are actually useless

  // subtract 1 to an index
  inline constexpr size_t
  sub_1_to_idx(size_t const &i) {
      return i - 1;
  }

  // build a parameter pack from (a,b,c,...) to (a-1,b-1,c-1,...)
  template<typename... idx>
  constexpr std::array<size_t, sizeof...(idx)>
  shift_idx(const idx &... indices) {
          return std::array<size_t, sizeof...(idx)> 
   {{sub_1_to_idx(indices)...}};

  };

  //use this to return an index of a 1D array where the values of a possibly multidimentional, possibly symmetric Tensor are stored.
  template<>
  template<typename... idx>
  size_t
  TA_Index<2, Tsym::IJ>::of(const idx&... indices) {
      static_assert(sizeof...(indices) == 2, "*** ERROR *** Please access second order tensor with 2 indices");

      const std::array<size_t, sizeof...(idx)> id = shift_idx(indices...);
      return id[0] > id[1] ? (id[0] + (id[1] * (2 * 3 - id[1] - 1)) / 2)
                           : (id[1] + (id[0] * (2 * 3 - id[0] - 1)) / 2);

Solution

  • There isn't much you can change about this. Yes, you could create a dense compile-time lookup table, but would you access it one-based (requires padding, i.e. space overhead) or zero-based (requires re-indexing, i.e. time overhead)? And would it actually buy you any meaningful performance?

    Have you compared the assembly resulting from one-based vs zero-based indices?

    https://godbolt.org/z/9pKt6Q

    They are virtually identical, save for two lea (to, well, subtract 1 from both input indices) and an additional cmova (no idea what's going on there). If the zero-based code was good enough, the one-based one will be equally fine.

    Also, in either case, the compiler perfectly constant-folded the indices that were known at compile-time. That's bread-and-butter optimization for a compiler, so whether or not your functions are constexpr or not is going to be insubstantial when indices are known at compile-time.

    And finally, for operations on higher-dimensional/larger tensors, the semi-random access pattern incurred from the index calculations will likely be a much bigger performance bottleneck (due to not using the cache efficiently) than the index calculations themselves.