Function composition with functor - Performance overhead

As we know, in C++ 11, can't deduce the return type of function returning lambda with auto or decltype(auto). So can't use code like below.

template <typename F, typename G>
constexpr decltype(auto) compose(const F& f, const G& g) noexcept 
{
    return [&f,&g](auto&& x)
    {
        return f(g(std::forward<decltype(x)>(x)));
    };
}

Instead, work around like the below could be employed;

template <typename F, typename G>
struct Composed
{
    const F &f;
    const G &g;

    Composed(const F &f, const G &g) : f(f), g(g) {}

    template <typename Arg>
    inline auto operator()(Arg &&arg) -> decltype(f(g(std::forward<Arg>(arg))))
    {
        return f(g(std::forward<Arg>(arg)));
    }
};

template <typename F, typename G>
Composed<F, G> compose(const F &f, const G &g)
{
    return Composed<F, G>(f, g);
}

As always, the actual performance could vary by time, but what would be the general performance of this approach compared to lambda capturing lambda?

Does adding inline in front of operator() could help? Or still there is inevitable overhead?

Solution

Always benchmark your code, where you get the illusion that one is more run-time/compile-time/memory/space efficient over the other. For instance, using the tool from quick-bench.com, both codes without storing the references, for function pointers f<std::string>, g<std::string> I get the following ^*result.

https://quick-bench.com/q/ZbgOBnT7Or1EJqXlSF-LHf4b8l0

^*Warning : I might be wrong, in assuming the parameters of the functions that passed; however you may play around with different code changes, compilers and optimization options, and see if one way is better than the other.