Search code examples
c++clangllvmlldb

Why is LLDB in C++ able to print my entire data structure but not able to print subcomponents?


I'm using Mac OS Sonoma 14.5 with LLDB 1500.0.404.7 installed via x-code tools. I noticed that my LLDB behaves in a rather confusing way. It is able to print out some std:containers in their entirety yet it is not able to print out sub components of those same containers. In particular I am able to execute the following p primary_map command

(lldb) p primary_map
(std::unordered_map<int, std::unordered_set<int> >) size=2 {
  [0] = {
    first = 1
    second = size=2 {
      [0] = 5
      [1] = 4
    }
  }
  [1] = {
    first = 0
    second = size=2 {
      [0] = 2
      [1] = 1
    }
  }
}

And clearly LLDB can show me the entire map. Yet for some reason if I execute p primary_map[0]

(lldb) p primary_map[0]
error: Couldn't lookup symbols:
  __ZNSt3__113unordered_mapIiNS_13unordered_setIiNS_4hashIiEENS_8equal_toIiEENS_9allocatorIiEEEES3_S5_NS6_INS_4pairIKiS8_EEEEEixEOi

LLDB is not able to render a subcomponent of the datastructure.

Question:

I want to know, why is this happening, how to prevent it in the future, and if it can't be prevented what is the best workaround.

Context:

I have a sample piece of code. It basically creates map of integers to sets of integers by reading the contents of a file. In particular the #of rows in the file and the number of integers per row are NOT known at compile time and only discovered at run-time by reading the file itself (somehow I think this is important).

working_driver.cpp

#include <iostream>
#include <fstream>
#include <string>
#include <unordered_set>
#include <unordered_map>

using namespace std;

unordered_map<int, unordered_set<int> > primary_map;

int main() {
    ifstream input("sample_data_working.txt");
    int row_count; int col_count; int holder;
    input >> row_count; input >> col_count; 

    for (int i =0; i < row_count; i++) {
        primary_map[i] = unordered_set<int>{}; 
        for (int j = 0; j < col_count; j++) {
            input >> holder;
            primary_map[i].insert(holder);
        }
    }
    return 0;
}

The file is below, its header is a pair of numbers (# of rows, # of columns) and then lines containing numbers.

sample_data_working.txt

2 2
1 2
4 5

Reproducing the Problem:

I compile my code using the following command: g++ -std=c++14 -g -O0 -fstandalone-debug working_driver.cpp -o working_driver which actually makes a call to clang (because Mac OS) the version is given as: Apple clang version 15.0.0 (clang-1500.3.9.4).

I then execute lldb working_driver to get an LLDB instance started. From here I set a breakpoint on my return statement via b 23 and then execute the r command which goes forward and gets me to my breakpoint. In the terminal this should look something like:

➜  differentiation git:(main) ✗ lldb working_driver
(lldb) target create "working_driver"
Current executable set to '/Users/sidharthghoshal/cp_training/USACO/python3/chapter4/2/stall4/differentiation/working_driver' (x86_64).
(lldb) b 23
Breakpoint 1: where = working_driver`main + 465 at working_driver.cpp:23:5, address = 0x0000000100000b91
(lldb) r
Process 3140 launched: '/Users/sidharthghoshal/cp_training/USACO/python3/chapter4/2/stall4/differentiation/working_driver' (x86_64)
Process 3140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000b91 working_driver`main at working_driver.cpp:23:5
   20               primary_map[i].insert(holder);
   21           }
   22       }
-> 23       return 0;
   24   }
Target 0: (working_driver) stopped.

It is at this point that chaos ensues. p primary_map works exactly as expected at the top of this question yet p primary_map[0] does not.

My ideas on the nature of the problem:

  1. I have seen the p primary_map[0] command work on the same structure unordered_map<int, unordered_set<int> > in the past but formed slightly differently (namely hardcoded upper bounds row_count, col_count in the loop). So stackoverflow posts that say primary_map[0] is not well defined are not exactly satisfactory because that command DOES work sometimes, just not on this instance. Somehow the "runtime-discovered loop bounds" appear to be a confounding issue here.

  2. The commonly discussed fix is to write either a custom allocator or custom parser, what I don't understand is why do I have to do this? Clearly the built-in parser that LLDB carries for unordered_map<int, unordered_set<int> > is sufficient in some cases. Just not this case. I would like to know a why. I then would deeply appreciate the how.

Some further research:

The highest rated answer here clarifies a lot. In particular frame var primary_map[0] does behave as expected. It seems that somehow LLDB is clever enough to know to print the frame var for some of my code but other times it is not smart enough to frame var and then just emits the odd symbol error. It's not clear to me what run-time defined loop bounds have do with switching from frame var to expr but the connection is becoming a little clearer now.


Solution

  • tl;dr

    primary_map[0] would require running library code that might have not even been compiled into the binary. That's the same even for the "simple" std::vector v{1,2,3};, where the debugger can't run v[0]. But using only language expressions, you can get there, e.g. in the case of std::vector you could do v._M_impl._M_start[0] to get to the first element.

    Full answer

    I want to know, why is this happening, how to prevent it in the future, and if it can't be prevented what is the best workaround.

    I think the reason is that the debugger can interpret from the console only language expressions and not library expressions. That means, it understands foo[bar] if foo is C-style array, on which the builtin operator[] works (which also implies that bar is an int-like thing), but it doesn't understand it if foo is user defined type.

    And the reason is that if foo is C-style array and bar is an int-like thing (I mean, something suitable for an index for a builtin array), then getting to that item in memory does not require to compile any code, but just using the runtime of C/C++, which is known to the debugger regardless of even the existence of your program.

    On the other hand, interpreting foo[bar] where foo is a class, like std::vector, would require the debugger to call into some code that is specifically part of your program, because your program #include<vector>s and uses std::vector<T>::operator[]. To call that, the debugger would need to interpret your compiled, hence binary code to go look for the part of the binary that implements operator[] and call that. I'm not even sure that would be a safe thing to do, considering that in general you might be attempting such a thing not for std::vector but for another class that could have a bugged operator[] or who knows what.

    Furthemore, your program in general might not even be using the operator[], so that function might not even be anywhere at all in the binary, so how can you run it in the console?

    Here's a simpler exmaple. If you compile this code with debug symbols,

    #include <vector>
    int main() {
        std::vector<int> v{17,2,3};
        return 0;
    }
    

    and you put a breakpoint on the return, how can you expect the call to v[0] to succeed, if operator[] is not even in the binary because you haven't used it?

    However, if you know how to get to the data you want by just using language expressions, than you can do that. To do so, you can look into the implementation of std::vector, and work out that the actual values are stored in a C-array that is located at v._M_impl._M_start, so if you type v._M_impl._M_start[0]/v._M_impl._M_start[1]/v._M_impl._M_start[2] in the console, that will indeed print 17/2/3 as expected.

    Where did I see that the values of v are stored in v._M_impl._M_start? Well, the easiest way is to put a watcher on v, and the debugger will show it like this:

    Expression: v
     *- Result: size=3
       *- [0]: 1
       *- [1]: 2
       *- [2]: 3
       *+ [raw]: std::vector<int, std::allocator<int> >
    

    That raw is what you want to look into, so let's expand it:

    Expression: v
     *- Result: size=3
       *- [0]: 1
       *- [1]: 2
       *- [2]: 3
       *- [raw]: std::vector<int, std::allocator<int> >
          - std::_Vector_base<int, std::allocator<int> >: {...}
            - _M_impl: {...}
              - std::_Vector_base<int, std::allocator<int> >::_Vector_impl_data: {_M_start:0x000055555556d2b0, _M_finish:0x000055555556d2bc, ...}
                - _M_start: 0x000055555556d2b0
                 *- *_M_start: 17
                + _M_finish: 0x000055555556d2bc
                + _M_end_of_storage: 0x000055555556d2bc
    

    Here, searching for 17, you realize that it is equal to *_M_start, so _M_start is the C-array that contains the data. And that is under _M_impl, which is under v.