Search code examples
c++templatesc++11binaryvariadic

Binary search using variadic templates and lambda functions


Consider this,

struct Person {
    std::string name;
    Person (const std::string& n) : name(n) {}
    std::string getName(int, char) const {return name;}  // int, char play no role in this
        // simple example, but let's suppose that they are needed.
} *Bob = new Person("Bob"), *Frank = new Person("Frank"), *Mark = new Person("Mark"),
    *Tom = new Person("Tom"), *Zack = new Person("Zack");

const std::vector<Person*> people = {Bob, Frank, Mark, Tom, Zack};

Because people is sorted by name, we can carry out a binary search to find the element of people with a specific name. I want the call for this to look something like

Person* person = binarySearch (people, "Tom",
   [](Person* p, int n, char c) {return p->getName(n,c);},
   [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}, 5, 'a');

so the template function binarySearch can be used generically. I got it working with the following:

#include <iostream>
#include <string>
#include <vector>
#include <functional>

struct Person {
    std::string name;
    Person (const std::string& n) : name(n) {}
    std::string getName(int, char) const {return name;}  // int, char play no role in this
        // simple example, but let's supposes that they are needed.
} *Bob = new Person("Bob"), *Frank = new Person("Frank"), *Mark = new Person("Mark"),
    *Tom = new Person("Tom"), *Zack = new Person("Zack");

const std::vector<Person*> people = {Bob, Frank, Mark, Tom, Zack};

template <typename Container, typename Ret>
typename Container::value_type binarySearch (const Container& container, const Ret& value,
        std::function<Ret(const typename Container::value_type&, int, char)> f,
        std::function<bool(const Ret&, const Ret&)> comp,
        typename Container::difference_type low, typename Container::difference_type high,
        int n, char c) {
    if (low > high)
        std::cout << "Error!  Not found!\n";
    const typename Container::difference_type mid = (low + high) / 2;
    const Ret& r = f(container[mid], n, c);
    if (r == value)
        return container[mid];
    if (comp(r, value))
        return binarySearch (container, value, f, comp, mid + 1, high, n, c);
    return binarySearch (container, value, f, comp, low, mid - 1, n, c);
}

template <typename Container, typename Ret>
typename Container::value_type binarySearch (const Container& container, const Ret& value,
        std::function<Ret(const typename Container::value_type&, int, char)> f,
        std::function<bool(const Ret&, const Ret&)> comp, int n, char c) {
    return binarySearch (container, value, f, comp, 0, container.size() - 1, n, c);
}

int main() {
    const Person* person = binarySearch<std::vector<Person*>, std::string>
        (people, "Tom", &Person::getName,
        [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}, 5, 'a');
    std::cout << person->getName(5,'a') << '\n';  // Tom
}

But now for reasons I don't understand, I'm not able to replace the specific arguments int, char with Args.... You can go ahead and place Args... args and args... where needed in the above code, and it won't compile. What is wrong here? How to carry out this last step in the generalization? Or should the whole method be changed?

This is what I tried:

template <typename Container, typename Ret, typename... Args>
typename Container::value_type binarySearch (const Container& container, const Ret& value,
        std::function<Ret(const typename Container::value_type&, Args...)> f,
        std::function<bool(const Ret&, const Ret&)> comp,
        typename Container::difference_type low, typename Container::difference_type high,
        Args... args) {
    if (low > high)
        std::cout << "Error!  Not found!\n";
    const typename Container::difference_type mid = (low + high) / 2;
    const Ret& r = f(container[mid], args...);
    if (r == value)
        return container[mid];
    if (comp(r, value))
        return binarySearch (container, value, f, comp, mid + 1, high, args...);
    return binarySearch (container, value, f, comp, low, mid - 1, args...);
}

template <typename Container, typename Ret, typename... Args>
typename Container::value_type binarySearch (const Container& container, const Ret& value,
        std::function<Ret(const typename Container::value_type&, Args...)> f,
        std::function<bool(const Ret&, const Ret&)> comp, Args... args) {
    return binarySearch (container, value, f, comp, 0, container.size() - 1, args...);
}

int main() {
    const Person* person = binarySearch<std::vector<Person*>, std::string> (people, "Tom",
            &Person::getName,
        [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}, 5, 'a');
    std::cout << person->getName(5,'a') << '\n';
}

GCC 4.9.2:

[Error] no matching function for call to 'binarySearch(std::vector<Person*>&, const char [4], main()::__lambda0, main()::__lambda1, int, char)'
template argument deduction/substitution failed:
[Note] 'main()::__lambda0' is not derived from 'std::function<std::basic_string<char>(Person* const&, Args ...)>'

Update: Having studied Yakk's solution, I've adapted my solution to the following (using more first principles instead of std::equal_range):

#include <iostream>
#include <iterator>

template <typename Container, typename T, typename Comparator = std::less<T>>
typename Container::value_type binarySearchRandomAccessIterator (const Container& container, T&& value, Comparator&& compare, typename Container::difference_type low, typename Container::difference_type high) {
    if (low > high)
        {std::cout << "Error!  Not found!\n";  return container[high];}
    const typename Container::difference_type mid = (low + high) / 2;
    const auto& t = compare.function(container[mid]);  // Using 'const T& t' does not compile.
    if (t == value)
        return container[mid];
    if (compare.comparator(t, value))  // 't' is less than 'value' according to compare.comparator, so search in the top half.
        return binarySearchRandomAccessIterator (container, value, compare, mid + 1, high);
    return binarySearchRandomAccessIterator (container, value, compare, low, mid - 1);  // i.e. 'value' is less than 't' according to compare.comparator, so search in the bottom half.
}

template <typename ForwardIterator, typename T, typename Comparator = std::less<T>>
typename std::iterator_traits<ForwardIterator>::value_type binarySearchNonRandomAccessIterator (ForwardIterator first, ForwardIterator last, T&& value, Comparator&& compare) {
    ForwardIterator it;
    typename std::iterator_traits<ForwardIterator>::difference_type count, step;
    count = std::distance(first, last);
    while (count > 0) {  // Binary search using iterators carried out.
        it = first;
        step = count / 2;
        std::advance(it, step);  // This is done in O(step) time since ForwardIterator is not a random-access iterator (else it is done in constant time).  But the good news is that 'step' becomes half as small with each iteration of this loop.
        const auto& t = compare.function(*it);  // Using 'const T& t' does not compile.
        if (compare.comparator(t, value)) {  // 't' is less than 'value' according to compare.comparator, so search in the top half.
            first = ++it;  // Thus first will move to one past the half-way point, and we search from there.
            count -= step + 1;  // count is decreased by half plus 1.
        }
        else  // 't' is greater than 'value' according to compare.comparator, so remain in the bottom half.
            count = step;  // 'count' and 'step' are both decreased by half.
    }
    if (compare.function(*first) != value)
        std::cout << "Error!  Not found!\n";
    return *first;
}

template <typename Container, typename T, typename Comparator = std::less<T>>  // Actually the version below could be used if Container has a random-access iterator.  It would be with the same time complexity since std::advance has O(1) time complexity for random-access iterators.
typename std::enable_if<std::is_same<typename std::iterator_traits<typename Container::iterator>::iterator_category, std::random_access_iterator_tag>::value, typename Container::value_type>::type
        binarySearch (const Container& container, T&& value, Comparator&& compare = {}) {
    std::cout << "Calling binarySearchWithRandomAccessIterator...\n";
    return binarySearchRandomAccessIterator (container, value, compare, 0, container.size() - 1);
}

// Overload used if Container does not have a random-access iterator.
template <typename Container, typename T, typename Comparator = std::less<T>>
typename std::enable_if<!std::is_same<typename std::iterator_traits<typename Container::iterator>::iterator_category, std::random_access_iterator_tag>::value, typename Container::value_type>::type
        binarySearch (const Container& container, T&& value, Comparator&& compare = {}) {
    std::cout << "Calling binarySearchNonRandomAccessIterator...\n";
    return binarySearchNonRandomAccessIterator (std::begin(container), std::end(container), value, compare);
}

template <typename Function, typename Comparator>
struct FunctionAndComparator {
    Function function;
    Comparator comparator;
    FunctionAndComparator (Function&& f, Comparator&& c) : function(std::forward<Function>(f)), comparator(std::forward<Comparator>(c)) {}
};

template <typename Function, typename Comparator = std::less<>>
FunctionAndComparator<std::decay_t<Function>, std::decay_t<Comparator>> functionAndComparator (Function&& f, Comparator&& c = {}) {
    return {std::forward<Function>(f), std::forward<Comparator>(c)};
}

#include <string>
#include <vector>
#include <list>

struct Person {
    std::string name;
    Person (const std::string& n) : name(n) {}
    std::string getName (int, char) const {return name;}  // int, char play no role in this simple example, but let's supposes that they are needed.
} *Bob = new Person("Bob"), *Frank = new Person("Frank"), *Mark = new Person("Mark"), *Tom = new Person("Tom"), *Zack = new Person("Zack");

const std::vector<Person*> peopleVector = {Bob, Frank, Mark, Tom, Zack};
const std::list<Person*> peopleList = {Bob, Frank, Mark, Tom, Zack};

int main() {
    Person* tom = binarySearch (peopleVector, "Tom", functionAndComparator([](const Person* p) {return p->getName(5,'a');}, [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}));
    if (tom) std::cout << tom->getName(5,'a') << " found.\n";

    Person* bob = binarySearch (peopleVector, "Bob", functionAndComparator([](const Person* p) {return p->getName(3,'k');}));  // The default comparator, std::less<std::string>, is actually the same as the comparator used above.
    if (bob) std::cout << bob->getName(3,'k') << " found.\n";

    Person* frank = binarySearch (peopleList, "Frank", functionAndComparator([](const Person* p) {return p->getName(8,'b');}));
    if (frank) std::cout << frank->getName(8,'b') << " found.\n";

    Person* zack = binarySearch (peopleList, "Zack", functionAndComparator([](const Person* p) {return p->getName(2,'c');}));
    if (zack) std::cout << zack->getName(2,'c') << " found.\n";

    Person* mark = binarySearch (peopleList, "Mark", functionAndComparator([](const Person* p) {return p->getName(6,'d');}));
    if (mark) std::cout << mark->getName(6,'d') << " found.\n";
}

Solution

  • In my opinion

    Person* person = binarySearch (people, "Tom",
      [](Person* p, int n, char c) {return p->getName(n,c);},
     [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}, 5, 'a');
    

    is a horrible syntax. Your binarySearch function is resposible for way too many things.

    But first, what went wrong: Your ambiguous error occurred because a lambda is not a std::function. It tries to deduce the std::function type from the lambda, and fails because they are unrelated types. The ability to deduce Args... from somewhere else doesn't help.

    You can wrap your std::function arguments in:

    template<class T>struct tag{using type=T;};
    template<class Tag>using type_t=typename Tag::type;
    template<class T>using identity=type_t<tag<T>>;
    

    identity< std::function< whatever... > > and your code will start to compile (as Args... is deduced elsewhere). identity<?> blocks template type deduction on that argument, so the compiler doesn't try anymore, and instead deduces the type from the other arguments.

    However, this is not a good solution.

    A better solution is to make the type of f and c be F and C -- don't make them into std::functions at all. This removes pointless type-erasure overhead, and removes the need for identity<?>

    This is still not a good solution, because your template function does a bunch of things, few of them well. Instead, decompose your operation into simpler problems, then compose those together:


    First, we already have std::equal_range, which is going to be a better binary search than any you are likely to write. Writing a function that returns a single element, and takes a container, seems reasonable, as working with iterators is annoying.

    To pull this off, first we write some range-based boilerplate:

    namespace adl_aux {
      using std::begin; using std::end;
      template<class R>
      auto adl_begin(R&&)->decltype(begin(std::declval<R>()));
      template<class R>
      auto adl_end(R&&)->decltype(end(std::declval<R>()));
    }
    template<class R>
    using adl_begin = decltype(adl_aux::adl_begin(std::declval<R>));
    template<class R>
    using adl_end = decltype(adl_aux::adl_end(std::declval<R>));
    
    template<class R>using iterator_t = adl_begin<R>;
    template<class R>using value_t = std::remove_reference_t<decltype(*std::declval<iterator_t<R>>())>;
    

    This allows us to support std:: containers and arrays and 3rd party iterable containers and ranges. The adl_ stuff does argument dependent lookup of begin and end for us. The iterator_t and value_t does SFINAE-friendly determining of the value and iterator type of a range.

    Now, bin_search on top of that boilerplate:

    template<class R, class T, class F=std::less<T>>
    value_t<R>* bin_search( R&& r, T&& t, F&& f={} ) {
      using std::begin; using std::end;
      auto range = std::equal_range( begin(r), end(r), std::forward<T>(t), std::forward<F>(f) );
      if (range.first==range.second) return nullptr;
      return std::addressof( *range.first ); // in case someone overloaded `&`
    }
    

    which returns a pointer to the element t under the ordering f presuming R is sorted under it if it exists, and otherwise nullptr.

    The next part is your ordering mess:

    [](Person* p, int n, char c) {return p->getName(n,c);},
    [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}, 5, 'a'
    

    first, get rid of that args...:

    [](int n, char c){
      return [n,c](Person* p) {return p->getName(n,c);}
    }(5,'a'),
    [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}
    

    if you really need to do it on one line, do the binding directly.

    Next, we want order_by:

    template<class F, class C>
    struct order_by_t : private F, private C {
      F const& f() const { return *this; }
      C const& c() const { return *this; }
      template<class T>
      auto f(T&&t)const
      ->decltype( std::declval<F const&>()(std::declval<T>()) )
      {
        return f()(std::forward<T>(t));
      }
      template<class T, class... Unused> // Unused to force lower priority
      auto f(T&&t, Unused&&... ) const
      -> std::decay_t<T>
      { return std::forward<T>(t); }
      template<class Lhs, class Rhs>
      bool operator()(Lhs&& lhs, Rhs&& rhs) const {
        return c()( f(std::forward<Lhs>(lhs)), f(std::forward<Rhs>(rhs)) );
      }
      template<class F0, class C0>
      order_by_t( F0&& f_, C0&& c_ ):
        F(std::forward<F0>(f_)), C(std::forward<C0>(c_))
      {}
    };
    template<class C=std::less<>, class F>
    auto order_by( F&& f, C&& c={} )
    -> order_by_t<std::decay_t<F>, std::decay_t<C>>
    { return {std::forward<F>(f), std::forward<C>(c)}; }
    

    order_by takes a projection from a domain to a range, and optionally an ordering on that range, and produces an ordering on the domain.

    order_by(
      [](int n, char c){
        return [n,c](Person const* p)
        ->decltype(p->getName(n,c)) // SFINAE enabled
        {return p->getName(n,c);};
      }(5,'a'),
      [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}
    }
    

    is now an ordering on Person const*s that follows your requirements.

    We then feed this into bin_search:

    auto ordering = order_by(
      [](int n, char c){
        return [n,c](Person const* p)
        ->decltype(p->getName(n,c)) // SFINAE enabled
        {return p->getName(n,c);}
      }(5,'a'),
      [](const std::string& x, const std::string& y) {return x.compare(y) < 0;}
    );
    Person*const* p = bin_search( people, "Tom", ordering );
    

    now, some care had to be made to make order_by a "transparent" function object, where it accepts both things that can be projected (under the projection) and cannot be (which are passed directly to the comparator).

    This requires that the projection operation be SFINAE friendly (ie, that it "fail early"). To this end, I explicitly determined its return type. (Below we see that this is not required, but it may be in more complex situations).

    Live example.

    Amusingly, your [](const std::string& x, const std::string& y) {return x.compare(y) < 0;} agrees with operator< on a std::string, so you could drop that (and make order_by simpler). However, I suspect your real use case needs it, and it is a useful feature to fortify order_by with.

    Finally, note that this part:

      [](int n, char c){
        return [n,c](Person const* p)
        ->decltype(p->getName(n,c)) // SFINAE enabled
        {return p->getName(n,c);}
      }(5,'a'),
    

    is ugly, and can be replaced with:

      [](Person const* p)
        ->decltype(p->getName(5,'a')) // SFINAE enabled
        {return p->getName(5,'a');}
    

    which is less ugly. Also, because the parameter check of the lambda is enough, we can drop the SFINAE explicit return type stuff:

      [](Person const* p)
        {return p->getName(5,'a');}
    

    and we are done. Simpler example:

    auto ordering = order_by(
       [](Person const* p)
         {return p->getName(5,'a');}
    );
    Person*const* p = bin_search( people, "Tom", ordering );
    

    or even:

    Person*const* p = bin_search( people, "Tom",
      order_by( [](Person const* p) {return p->getName(5,'a');} )
    );
    

    which looks far less ugly, no?

    Oh, and:

    using std::literals;
    Person*const* p = bin_search( people, "Tom"s,
      order_by( [](Person const* p) {return p->getName(5,'a');} )
    );
    

    might have better performance, as it will avoid repeatedly constructing a std::string("Tom") on every comparison. Similarly, a getName that returns a std::string const& (if possible) can also give a performance boost. The "projection lambda" might have to have a ->decltype(auto) in it to realise this second boost.

    I used some C++14 above. The std::remove_reference_t<?> (and similar) aliases can be replaced with typename std::remove_reference<?>::type, or you can write your own _t aliases. The advice to use decltype(auto) can be replaced with decltype(the return expression) in C++11.

    order_by_t uses inheritance to store F and C because they are likely to be empty classes, so I wanted to exploit the empty base optimization.