2011-02-09

Profiling Boost Multi-Array

Worried by this StackOverflow thread, I did my own experiments on
profiling the access speed in Boost MultiArrays.


First of all, the thread is correct, BOOST_MA needs about 3 times as
much time to iterate over the array as Native.


The additional cases I checked are quite interesting, though:



BOOST_IT
only has a 30% overhead over Native. Note that you can
get around the obfuscating pointer types by using the auto keyword
and -std-gnu++0x as a compiler argument (works for gcc version >= 4.4).

RAW_POINTER
is actually slower than Native. This is a typical
case of "do not try to be smarter than your compiler. Note that this
is the same algorithm like the one you get when you use std::fill.

unsigned int
index types are significantly slower (about 2x) in
Native condition than int index types. This is bad, because
intuitively, one would choose unsigned int for array indexing.

My code:





#include <boost/date_time/posix_time/posix_time.hpp>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>
using namespace boost::posix_time; 

int main(int argc, char* argv[])
{
  typedef int idx;
  const idx X_SIZE = 400;
  const idx Y_SIZE = 400;
  const idx ITERATIONS = 5000;

  double *nativeMatrix = new double [X_SIZE * Y_SIZE];

  typedef boost::multi_array_ref<double, 2> ImageArrayType;
  ImageArrayType boostMatrix(nativeMatrix, boost::extents[X_SIZE][Y_SIZE]);    

  // Native condition
  ptime startTime = microsec_clock::universal_time();
  for (idx i = 0; i < ITERATIONS; ++i)
      for (idx y = 0; y < Y_SIZE; ++y)
          for (idx x = 0; x < X_SIZE; ++x)
              nativeMatrix[x + (y * X_SIZE)] = 2.345;
  ptime endTime = microsec_clock::universal_time();
  printf("[Native]Elapsed time: %6.3f seconds\n", time_period(startTime, endTime).length().total_milliseconds()/1000.f);

  // other conditions   
  startTime = microsec_clock::universal_time();
  for (idx i = 0; i < ITERATIONS; ++i)
    {
#ifdef RAW_POINTER
      double* end = boostMatrix.data() + X_SIZE*Y_SIZE;
      for(double* begin=boostMatrix.data(); begin!=end; ++begin)
        *begin = 2.345;
#elif defined(BOOST_IT)
      for(auto it=boostMatrix.begin(); it!= boostMatrix.end(); ++it)
          for(auto it2=(*it).begin(); it1!=(*it).end(); ++it2)
              *it2 = 2.345;
#elif defined(BOOST_MA)
      for (idx y = 0; y < Y_SIZE; ++y)
         for (idx x = 0; x < X_SIZE; ++x)
             boostMatrix[y][x] = 2.345;
#endif
    }
  endTime = microsec_clock::universal_time();
  printf("[Boost] Elapsed time: %6.3f seconds\n", time_period(startTime,endTime).length().total_milliseconds()/1000.f );

  return 0;
}

All compiled and executed using


g++-4.4 -O3 -g0 -DNDEBUG -march=native -mtune=native --fast-math -std=gnu++0x % && ./a.out

No comments:

Post a Comment