Search code examples
c++vectordata-structuresin-placevalarray

valarray in-place operation gives different result as a temporary assignment


the following program:

#include<iostream>
#include<valarray>

using namespace std;

int main() {
  int init[] = {1, 1};

  // Example 1
  valarray<int> a(init, 2);
  // In-place assignment
  a[slice(0, 2, 1)] = a[slice(0, 2, 1)] + valarray<int>(a[slice(0, 2, 1)]) * a[0];

  for (int k = 0; k < 2; ++ k) {
    cout << a[k] << ' ';  // Outputs 2 3
  }
  cout << endl;

  // Example 2
  valarray<int> b(init, 2);
  // Temporary assignment
  valarray<int> r = b[slice(0, 2, 1)] + valarray<int>(b[slice(0, 2, 1)]) * b[0];
  b[slice(0, 2, 1)] = r;

  for (int k = 0; k < 2; ++ k) {
    cout << b[k] << ' '; // Outputs 2 2
  }
  cout << endl;
  return 0;
}

outputs:

2 3
2 2

The correct answer is 2 2 (<1 1> + <1 1> * 1 = <2 2>. Why is the inline version outputting something different?

In case it matters, I'm compiling this way:

g++ myprogram.cpp -o myprogram

And the output of g++ -v is:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.5' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) 

Solution

  • First, a[slice(0, 2, 1)] has type slice_array<T>, and there is no overload of operator+ taking an slice_array<T> object or reference as parameter.

    Note the possible working overload operator+(const valarray<T>&, const valarray<T>&) is a function template, though slice_array<T> can be implicitly converted to valarray<T>, the template argument T cannot be deduced from the slice_array<T> argument.

    So strictly speaking, your code will cause a compile error. In fact, Clang does.


    Second, you should know there are some optimization techniques for operations of valarray. One well-konwn technique is expression templates, which causes your unexpected results. To see how it works, let's consider a simpler example that reproduces this problem:

    valarray<int> a{1, 1};
    a = a + a[0];
    // now a is {2, 3} while {2, 2} is expected
    

    The key idea of expression templates is to postpone the evaluation of expression until its value is really needed, such that extra temporary is avoided.

    In the example above, an optimizer may choose to optimize the result of a + a[0] to be a proxy object instead of a valarray<int> temporary. The proxy object just stores the action (not the result value) of "adding a[0] to a".

    When the proxy object is then assigned to a, actual evaluation occurs. From the stored action, the optimizer will choose to assign a[i] + a[0] to a[i] for each i. Now different evaluation orders in this assignment will result in different results. For example, if the compiler assigns a[0] + a[0] to a[0], and then assigns a[1] + a[0] (here a[0] is changed to 2) to a[1], the unexpected result {2, 3} is produced.

    The standard allows such proxy object to exist, but it seems not clearly to specify how the proxy object should work. I personally think this is a compiler bug, because simply evaluating a[0] and storing its value before assignment will solve this problem with little performance loss.