2011-03-25

Easy Parallelization with C++0X lambda functions, Thread Building Blocks

Lambda functors in C++0X


The relatively new gcc-4.5 release supports lambda expressions,
which – in contrast to boost.lambda, boost.bind and the like – provide
easy capturing of variables in context in the lambda expression.
This finally makes the STL algorithms usable, such as





struct add_val{
  add_val(double d):val(d){}
  void operator()(const double& d){
    d+=val;
  }
  double val;
};


int main(){
  std::vector<double> v;
  // ... fill vector

  double inc = 3.0;

  // the old way of doing things
  add_val av(inc);
  std::for_each(v.begin(),v.end(),av);

  // using a boost.lambda function
  std::for_each(v.begin(),v.end(), _1+=inc);

  // using a c++0x lambda function
  std::for_each(v.begin(),v.end(), [=](double& d){d+=inc;});
}

The "old way" primarily has inconveniences for the programmer:
  • We need to define a struct outside the scope, even for tiny
    functionality.
  • We need to explicitly capture variables from the scope of the
    surrounding code (here: inc) in the struct.

Boost.Lambda tried to resolve this problem, in an elegant way I
believe. However, there are still shortcomings of this approach:



  • Variables have unintuitive names (_1, _2)
  • The code in the lambda expression is not really a block of code. It is an expression, where parts may be evaluated
    surprisingly.
  • Also, since this is an expression, conditionals and loops must be
    expressed in an awkward (that is, non-C++) way using if_, for_
    and so on.

The new C++0X lambda syntax gets rid of all these problems. While the
syntax looks a bit strange at first, it is much more readable than
boost.lambda constructs. The block of code is not out of scope, all
names of the surrounding code can be used as copy ([=]) or
reference ([&]).


You might wonder, why we do not just use boost.foreach and get away
with writing





#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
// ...as before...
foreach(double& d, v){
  d+=inc;
}

// in c++0x, not yet implemented, this will be
for(double& d : v){
  d+=inc;
}
… which is an idiom quite well-known in other languages. The fun
part is, that we cannot change what for does, but we can change the
implementation of for_each. This is one of the things that the Intel (R) Threading Building Blocks library does.

Parallelizing your for-loop by changing two lines


A nice trick now is to change your (side-effect-free) for-loops like
this:





#include <tbb/parallel_for_each.h>

// before
foreach(double& d, v){
  d+=inc;
}

// after
tbb::parallel_for_each(v.begin(),v.end(),[=](double& d){
  d+=inc;
});

… and everything in this loop is automatically run in parallel.

Similar things can be done with OpenMP:



int end = v.size();
#pragma omp parallel for
for(int i=0;i<end;i++){
   v[i]+=inc;
}

but this is already much more intrusive. Furthermore, the loop index
must be an int and the last index must be known. While
variables may be passed as copy (using private (inc)), these
variables are private to the thread, not to the "wrapped" function. Of
course, OpenMP gives you much more fine-grained control over
parallelization as well.