Search code examples
c++c++11booststlc++-amp

Static for cycle


I am writing templated short vector and small matrix classes, that are not restricted to having 2-3-4 elements, but can have an arbitrary number of elements.

template <typename T, size_t N>
class ShortVector
{
public:

    ...

    template <size_t I> T& get() { return m_data[I]; }
    template <size_t I> const T& get() const { return m_data[I]; }

private:

    T m_data[N];
};

I want to have the access interface be static, so that I can specialize the class to use built-in vector registers for the supported sizes of the class. (May them be AVX, C++AMP or OpenCL vectors.) Problem is that writing ALL the desirable operators for this class (unary-, +, -, *, /, dot, length, ...) requires an awful lot of template recursion, and I haven't even gotten to implement matrix-vector and matrix-matrix multiplication, where I will need nested recursion.

Right now I have non-member friend operators and a private member class with various static functions such as

template <size_t I, typename T1, typename T2> struct Helpers
{
    static void add(ShortVector& dst, const ShortVector<T1, N>& lhs, const ShortVector<T2, N>& rhs)
    {
        dst.get<I>() = lhs.get<I>() + rhs.get<I>();
        Helpers<I - 1, T1, T2>::add(dst, lhs, rhs);
    }

    ...
};
template <typename T1, typename T2> struct Helpers < 0, T1, T2 >
{
    static void add(ShortVector& dst, const ShortVector<T1, N>& lhs, const ShortVector<T2, N>& rhs)
    {
        dst.get<0>() = lhs.get<0>() + rhs.get<0>();
    }

    ...
};

Writing static functions and specializations like this for all operators just feels wrong. Writing the more complex operations in this manner is highly error prone. What I'm looking for is something like

static_for< /*Whatever's needed to define something like a run-time for cycle*/, template <size_t I, typename... Args> class Functor>();

Or practically anything that let's me omit the majority of this boilerplate code. I have started writing such a class, but I could not get it to compile with a reasonable specialization. I feel I still lack the skill to write such a class (or function). I have looked at other libs such as Boost MPL, but have not fully committed to using it. I have also looked at std::index_sequence which might also prove useful.

While the std::index_sequence seems like the most portable solution, it has a major flaw I am reluctant to look over. Ultimately these classes must be SYCL compatible, meaning I am restricted to using C++11, including template metaprogramming techniques. std::integer_sequence is a C++14 STL library addition, and while this restriction of language standard only matters in terms of language features, nothing prevents the STL implementer to use C++14 language features while implementing a C++14 STL feature, therefore using C++14 STL features might not be portable.

I am open to suggestions, or even solutions.

EDIT

Here is what I'v come up so far. This is the header of Template Metaprogramming tricks I started to collect, and the for loop would be next in line. The helper needs a functor which has the running index as it's first parameter, and accepts various predicates. It would keep instantiating the functor as long as the predicate for the next iteration holds true. It would be possible to have the running index increment by any number, be multiplied by a number, etc.


Solution

  • What about this:

    template <size_t I, typename Functor, typename = std::make_index_sequence<I>>
    struct Apply;
    
    template <size_t I, typename Functor, std::size_t... Indices>
    struct Apply<I, Functor, std::index_sequence<Indices...>> :
        private std::tuple<Functor> // For EBO with functors
    {
        Apply(Functor f) :  std::tuple<Functor>(f) {}
        Apply() = default;
    
        template <typename InputRange1, typename InputRange2, typename OutputRange>
        void operator()(OutputRange& dst,
                        const InputRange1& lhs, const InputRange2& rhs) const
        {
            (void)std::initializer_list<int>
            { (dst.get<Indices>() = std::get<0>(*this)(lhs.get<Indices>(),
                                                       rhs.get<Indices>()), 0)... };
        }
    };
    

    Usage could be

    Apply<4,std::plus<>>()(dest, lhs, rhs); // Size or functor type 
                                            // can be deduced if desired
    

    A (slightly modified) example: Demo.

    You could also remove the functor state if it hinders you in any way:

    template <size_t I, typename Functor, typename = std::make_index_sequence<I>>
    struct Apply;
    
    template <size_t I, typename Functor, std::size_t... Indices>
    struct Apply<I, Functor, std::index_sequence<Indices...>>
    {
        template <typename InputRange1, typename InputRange2, typename OutputRange>
        void operator()(OutputRange& dst,
                        const InputRange1& lhs, const InputRange2& rhs) const
        {
            (void)std::initializer_list<int>
            { (dst.get<Indices>() = Functor()(lhs.get<Indices>(),
                                              rhs.get<Indices>()), 0)... };
        }
    };