Search code examples
c++linkerg++static-libraries

Valid pointer from static library code treated as a nullptr in the application code


I'm running a demo C++ application based on TensorFlow Lite Micro framework on Linux x64 machine. I've built the default library (actually an ar archive) and linked it against my demo app.

After some debugging I've noticed that the library code returns a valid data pointer directly to the application. If I try to dereference the pointer the application will segfault and indeed the application recognizes the data pointer as nullptr.

The library class method called from the application:

TfLiteTensor* MicroInterpreter::input(size_t index) {
  const size_t length = inputs_size();
  if (index >= length) {
    MicroPrintf("Input index %d out of range (length is %d)", index, length);
    return nullptr;
  }

  if (input_tensors_[index]->data.data == nullptr) {
    printf("returning null data pointer\n");
  }

  printf("returning data (idx = %ld) at %p\n", index, input_tensors_[index]->data.f);

  printf("data.f[0] == %f\n", (double)input_tensors_[index]->data.f[0]);
  input_tensors_[index]->data.f[0] = 5.f;
  printf("data.f[0] == %f\n", (double)input_tensors_[index]->data.f[0]);

  return input_tensors_[index];
}

The relevant part of the application code which calls the above library method directly:

  TfLiteTensor* input = interpreter.input(0);
  printf("received pointer at %p\n", input->data.f);

Output:

returning data (idx = 0) at 0x7ffe4c0f5a60
data.f[0] == 0.000000
data.f[0] == 5.000000
received pointer at (nil)

Meaning that the data pointer is valid and the memory pointed-to can be accessed, but only from the library code. As soon as I return that pointer to the application code it's suddenly treated as a nullptr.

I'm guessing the issue lies in how I'm linking the final executable, but I'm not sure what I got wrong. I've done it in two ways:

g++ test.cpp -o test.out -I.../tflite-micro/ -I.../tflite-micro/tensorflow/lite/micro/tools/make/downloads/flatbuffers/include -I.../tflite-micro/tensorflow/lite/micro/tools/make/downloads/gemmlowp/ --std=c++17 -L.../tflite-micro/gen/linux_x86_64_default/lib/ -l:libtensorflow-microlite.a && ./test.out

and:

g++ test.cpp .../tflite-micro/gen/linux_x86_64_default/lib/libtensorflow-microlite.a -o test.out -I.../tflite-micro/ -I/home/gstukelj/projects/plume/tflite-micro/tensorflow/lite/micro/tools/make/downloads/flatbuffers/include -I.../tflite-micro/tensorflow/lite/micro/tools/make/downloads/gemmlowp/ --std=c++17 && ./test.out

I tried with adding -static, building the library with -fpic, building without --std flags or with different values, set the LD_LIBRARY_PATH to .../tflite-micro/gen/linux_x86_64_default/lib/, but it didn't change a thing.

The toolchain and kernel/distro used:

g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

5.15.0-76-generic #83~20.04.1-Ubuntu SMP

EDIT:

I've checked and printed out the value of input_tensors_[index] returned from the library code as well and it's the same for both library and application:

returning data (idx = 0) at 0x7ffd524366b0
returning tensor (idx = 0) at 0x7ffd5243e260
data.f[0] == 0.000000
data.f[0] == 5.000000
received data pointer at (nil)
received tensor pointer at 0x7ffd5243e260

If I call the same library method again I can see the data pointer remains valid beyond the point at which it's treated as a nullptr in the application code:

  TfLiteTensor* input = interpreter.input(0);

  printf("received data pointer at %px\n", input->data.f);
  printf("received tensor pointer at %px\n", input);

  TfLiteTensor* input2 = interpreter.input(0);

  printf("received data pointer at %px\n", input2->data.f);
  printf("received tensor pointer at %px\n", input2);

output:

returning data (idx = 0) at 0x7fffec3695c0
returning tensor (idx = 0) at 0x7fffec371170
data.f[0] == 0.000000
data.f[0] == 5.000000
received data pointer at (nil)
received tensor pointer at 0x7fffec371170
returning data (idx = 0) at 0x7fffec3695c0
returning tensor (idx = 0) at 0x7fffec371170
data.f[0] == 5.000000
data.f[0] == 5.000000
received data pointer at (nil)
received tensor pointer at 0x7fffec371170

Solution

  • As pointed out in the comments, the issue was in ODR (One Definition Rule) violation. There were two definitions of the struct TfLiteTensor in the header file included in both the library code and client code, conditioned by an #ifdef. Making sure the same macro is defined in the client code as the one used by the library resolved the issue.