I have an issue with my code that has some very strange symptoms.
The code is compiled on my computer with the following versions:
a. GCC Version: 4.4.2
b. CMAKE verson: 2.8.7
c. QNX (operating system) version: 6.5.0
And the code has a segfault whilst freeing some memory and exiting from a function (not dying on any code, just on the exit from a function).
The weird things about this are:
The code does it in release mode but not debug mode:
a. The code is threaded so this indicates a race condition.
b. I cannot debug by putting it in debug mode.
The code when compiled on a workmates machine with the same versions of everything, does not have this problem.
a. The wierd things about this are that the workmates code works, but also that the binary created from compiling on his machine, which is the same, is about 6mB bigger.
Now annoyingly I cannot post the code because it is too big and also for work. But can anyone point me along a path to fixing this.
Since I am using QNX I am limited for my debug tools, I cannot use Valgrind and since it is not supported in QNX, GDB doesn't really help.
I am looking for anyone who has had a similar/same problem and what the cause was and how they fixed it.
EDIT:
Sooo... I found out what it was, but im still a bit confused about how it happened.
The culprit code was this:
Eigen::VectorXd msBb = data.modelSearcher->getMinimumBoundingBox();
where the definition for getMinimumBoundingBox
is this:
Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
and it returns a VectorXd which is always initialised as VectorXd output(6, 1)
. So I immediately thought, right it must be because the VectorXd is not being initialised, but changing it to this:
Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();
But this didn't work. In fact I had to fix it by changing the definition of the function to this:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
and the call to this
Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);
So now the new question:
What the hell? Why didn't the first change work but the second did, why do I have to pass by reference? Oh and the big question, how the hell didn't this break when my co-worker compiled it and I ran it? Its a straight out memory error, surely it shouldn't depend on which computer compiles it, especially since the compiler and all the other important things are the same!!??
Thanks for your help guys.
... the binary created from compiling on his machine, which is the same, is about 6mB bigger
It's worth figuring out what the difference is (even if it's just the case that his build hides, while yours exposes, a real bug):
-E
switch to your gcc arguments in cmake, so it will pre-process your files with the same include path as regular compilation; diff the pre-processor outputnm
or objdump
or whatever you have to for your two linked executables: if some system or 3rd-party library is a different version on one box, it may show up hereldd
if it's dynamically linked, make sure they're both getting the same library versions
pldd
, compare the .so
entries in /proc/pid/map
, run the process under strace
/dtrace
/truss
and compare the runtime linker activityAs for the code ... if this doesn't work:
Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
// ...
Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();
and this does:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
// ...
Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);
you presumably have a problem with the assignment operator. If it does a shallow copy and there is dynamically-allocated memory in the vector, you'll end up with two vectors holding the same pointer, and they'll both free
/delete
it.
Note that if the operator isn't defined at all, the default is to do this shallow copy.