I am unable to find examples that explain how to port lines of code that use the reducer_vector
class in Intel Cilk Plus or that use the reducer_list
class in GCC Cilk Plus to OpenCilk 2.1 (provides the cilk_reducer
keyword).
For instance, consider the following lines of code (assuming GCC Cilk Plus) shown as an example in David Bindel's slides (Beyond C/C++) https://www.cs.cornell.edu/~bindel/class/cs5220-f15/slides/2015-11-17-languages.pdf:
#include <cilk/cilk.h>
#include <cilk/reducer_list.h>
void reducer_list_test() {
using namespace std;
cilk::reducer< cilk::op_list_append<char> > letters_reducer;
// Build the list in parallel
cilk_for (char ch = ’a’; ch <= ’z’; ch++) {
simulated_work();
letters_reducer->push_back(ch);
}
// Reducer result as a standard STL list then output
const list<char> &letters = letters_reducer.get_value();
cout << "Letters from reducer_list:";
for (auto i = letters.begin(); i != letters.end(); i++)
cout << " " << *i;
cout << endl;
}
Would someone provide an example to port the above lines of code to OpenCilk 2.1?
This works for me, I am using std::list::splice
for appends, but you can also use std::vector::insert
or std::deque::insert
(probably faster since it would have less overhead, it all depends on the type of the objects you are appending).
#include <cilk/cilk.h>
#include <list>
#include <iostream>
using vec = std::list<char>;
template<class C>
void print_vl(const C& v)
{
for (auto i = v.begin(); i != v.end(); i++) std::cout << " " << *i;
}
void identity(void* view) { new (view) vec(); }
void append(void* left, void* right)
{
auto& l = *static_cast<vec*>(left);
auto& r = *static_cast<vec*>(right);
std::cout << "Inserting ";
print_vl(r);
std::cout << ", into ";
print_vl(l);
l.splice(l.end(), r);
std::cout << ". Result: ";
print_vl(l);
std::cout << "\n";
r.~vec();
}
void reducer_list_test()
{
vec cilk_reducer(identity, append) letters_reducer{};
// Build the list in parallel
cilk_for(char ch = 'a'; ch <= 'z'; ch++)
{
// simulated_work();
letters_reducer.push_back(ch);
}
// Reducer result as a standard STL list then output
const std::list<char> letters(letters_reducer.begin(), letters_reducer.end());
std::cout << "Letters from reducer_list:";
print_vl(letters);
std::cout << std::endl;
}
int main() { reducer_list_test(); }
When compiling with cilk sanitizer:
clang++ -fopencilk -fsanitize=cilk -O3 test.cpp
I get the following output:
./a.out
Running Cilksan race detector.
Inserting b, into a. Result: a b
Inserting c, into a b. Result: a b c
Inserting d, into a b c. Result: a b c d
Inserting e, into a b c d. Result: a b c d e
Inserting f, into a b c d e. Result: a b c d e f
Inserting g, into a b c d e f. Result: a b c d e f g
Inserting h, into a b c d e f g. Result: a b c d e f g h
Inserting i, into a b c d e f g h. Result: a b c d e f g h i
Inserting j, into a b c d e f g h i. Result: a b c d e f g h i j
Inserting k, into a b c d e f g h i j. Result: a b c d e f g h i j k
Inserting l, into a b c d e f g h i j k. Result: a b c d e f g h i j k l
Inserting m, into a b c d e f g h i j k l. Result: a b c d e f g h i j k l m
Inserting n, into a b c d e f g h i j k l m. Result: a b c d e f g h i j k l m n
Inserting o, into a b c d e f g h i j k l m n. Result: a b c d e f g h i j k l m n o
Inserting p, into a b c d e f g h i j k l m n o. Result: a b c d e f g h i j k l m n o p
Inserting q, into a b c d e f g h i j k l m n o p. Result: a b c d e f g h i j k l m n o p q
Inserting r, into a b c d e f g h i j k l m n o p q. Result: a b c d e f g h i j k l m n o p q r
Inserting s, into a b c d e f g h i j k l m n o p q r. Result: a b c d e f g h i j k l m n o p q r s
Inserting t, into a b c d e f g h i j k l m n o p q r s. Result: a b c d e f g h i j k l m n o p q r s t
Inserting u, into a b c d e f g h i j k l m n o p q r s t. Result: a b c d e f g h i j k l m n o p q r s t u
Inserting v, into a b c d e f g h i j k l m n o p q r s t u. Result: a b c d e f g h i j k l m n o p q r s t u v
Inserting w, into a b c d e f g h i j k l m n o p q r s t u v. Result: a b c d e f g h i j k l m n o p q r s t u v w
Inserting x, into a b c d e f g h i j k l m n o p q r s t u v w. Result: a b c d e f g h i j k l m n o p q r s t u v w x
Inserting y, into a b c d e f g h i j k l m n o p q r s t u v w x. Result: a b c d e f g h i j k l m n o p q r s t u v w x y
Inserting z, into a b c d e f g h i j k l m n o p q r s t u v w x y. Result: a b c d e f g h i j k l m n o p q r s t u v w x y z
Letters from reducer_list: a b c d e f g h i j k l m n o p q r s t u v w x y z
Cilksan detected 0 distinct races.
Cilksan suppressed 0 duplicate race reports.
Without the sanitizer I get something more optimized, but correct nonetheless:
./a.out
Inserting q, into n o p. Result: n o p q
Inserting r, into n o p q. Result: n o p q r
Inserting s, into n o p q r. Result: n o p q r s
Inserting n o p q r Inserting sw, into a b xc , into d e f g h ti j k l u v. Result: mt. Result: a b c d e u v fw x
Inserting y, into t u v w x. Result: t u v w x y
g h i j k l m n o p q r s
Inserting t u v w x y, into a b c d e f g h i j k l m n o p q r s. Result: a b c d e f g h i j k l m n o p q r s t u v w x y
Inserting z, into a b c d e f g h i j k l m n o p q r s t u v w x y. Result: a b c d e f g h i j k l m n o p q r s t u v w x y z
Letters from reducer_list: a b c d e f g h i j k l m n o p q r s t u v w x y z
A faster (I haven't benchmarked) version using deque would be like follows:
#include <cilk/cilk.h>
#include <deque>
#include <iostream>
using vec = std::deque<char>;
template<class C>
void print_vl(const C& v)
{
for (auto i = v.begin(); i != v.end(); i++) std::cout << " " << *i;
}
void identity(void* view) { new (view) vec(); }
void append(void* left, void* right)
{
auto& l = *static_cast<vec*>(left);
auto& r = *static_cast<vec*>(right);
l.insert(l.end(), r.begin(), r.end());
r.~vec();
}
void reducer_list_test()
{
vec cilk_reducer(identity, append) letters_reducer{};
// Build the list in parallel
cilk_for(char ch = 'a'; ch <= 'z'; ch++)
{
// simulated_work();
letters_reducer.push_back(ch);
}
std::cout << "Letters from reducer_list:";
print_vl(letters_reducer);
std::cout << std::endl;
}
int main() { reducer_list_test(); }
Notice that deque implementation performance varies a lot between standard library implementations. Have a look here.