Search code examples
c++multithreadingsegmentation-faultqnx

Same Program code with same compiler leads to different binaries


I have an issue with my code that has some very strange symptoms.

  1. The code is compiled on my computer with the following versions:

    a. GCC Version: 4.4.2

    b. CMAKE verson: 2.8.7

    c. QNX (operating system) version: 6.5.0

And the code has a segfault whilst freeing some memory and exiting from a function (not dying on any code, just on the exit from a function).

The weird things about this are:

  1. The code does it in release mode but not debug mode:

    a. The code is threaded so this indicates a race condition.

    b. I cannot debug by putting it in debug mode.

  2. The code when compiled on a workmates machine with the same versions of everything, does not have this problem.

    a. The wierd things about this are that the workmates code works, but also that the binary created from compiling on his machine, which is the same, is about 6mB bigger.

Now annoyingly I cannot post the code because it is too big and also for work. But can anyone point me along a path to fixing this.

Since I am using QNX I am limited for my debug tools, I cannot use Valgrind and since it is not supported in QNX, GDB doesn't really help.

I am looking for anyone who has had a similar/same problem and what the cause was and how they fixed it.

EDIT:

Sooo... I found out what it was, but im still a bit confused about how it happened.

The culprit code was this:

Eigen::VectorXd msBb = data.modelSearcher->getMinimumBoundingBox();

where the definition for getMinimumBoundingBox is this:

Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();

and it returns a VectorXd which is always initialised as VectorXd output(6, 1). So I immediately thought, right it must be because the VectorXd is not being initialised, but changing it to this:

Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();

But this didn't work. In fact I had to fix it by changing the definition of the function to this:

void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);

and the call to this

Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);

So now the new question:

What the hell? Why didn't the first change work but the second did, why do I have to pass by reference? Oh and the big question, how the hell didn't this break when my co-worker compiled it and I ran it? Its a straight out memory error, surely it shouldn't depend on which computer compiles it, especially since the compiler and all the other important things are the same!!??

Thanks for your help guys.


Solution

  • ... the binary created from compiling on his machine, which is the same, is about 6mB bigger

    It's worth figuring out what the difference is (even if it's just the case that his build hides, while yours exposes, a real bug):

    • double-check you're compiling exactly the same code (no un-committed local changes, no extra headers in the include search path, etc.)
      • triple-check by adding a -E switch to your gcc arguments in cmake, so it will pre-process your files with the same include path as regular compilation; diff the pre-processor output
    • compare output from nm or objdump or whatever you have to for your two linked executables: if some system or 3rd-party library is a different version on one box, it may show up here
    • compare output from ldd if it's dynamically linked, make sure they're both getting the same library versions
      • compare the library versions it actually gets at runtime too, if possible. Hopefully you can do one of: run pldd, compare the .so entries in /proc/pid/map, run the process under strace/dtrace/truss and compare the runtime linker activity

    As for the code ... if this doesn't work:

    Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
    // ...
    Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();
    

    and this does:

    void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
    // ...
    Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);
    

    you presumably have a problem with the assignment operator. If it does a shallow copy and there is dynamically-allocated memory in the vector, you'll end up with two vectors holding the same pointer, and they'll both free/delete it.

    Note that if the operator isn't defined at all, the default is to do this shallow copy.