Search code examples
dictionarytclallocation

Is Tcl nested dictionary uses references to references, and avoids capacity issues?


According to the thread: TCL max size of array Tcl cannot have >128M list/dictionary elements. However, One could have a nested dictionary, which total values (in different levels) exceeds the number.
Now, is the nested dictionary using references, by design? This would mean that as long as in 1 level of the dictionary tree, there is no more than 128M elements, you should be fine. Is that true? Thanks.


Solution

  • The current limitation is that no individual memory object (C struct or array) can be larger than 2GB, and it's because the high-performance memory allocator (and a few other key APIs) uses a signed 32-bit integer for the size of memory chunk to allocate.

    This wasn't a significant limitation on a 32-bit machine, where the OS itself would usually restrict you at about the time when you started to near that limit. However, on a 64-bit machine it's possible to address much more, while at the same time the size of pointers is doubled, e.g., 2GB of space means about 256k elements for a list, since each needs at least one pointer to hold the reference for the value inside it. In addition, the reference counter system might well hit a limit in such a scheme, though that wouldn't be the problem here.

    If you create a nested structure, the total number of leaf memory objects that can be held within it can be much larger, but you need to take great care to never get the string serialisation of the list or dictionary since that would probably hit the 2GB hard limit. If you're really handling very large numbers of values, you might want to consider using a database like SQLite as storage instead as that can be transparently backed by disk.


    Fixing the problem is messy because it impacts a lot of the API and ABI, and creates a lot of wreckage in the process (plus a few subtle bugs if not done carefully, IIRC). We'll fix it in Tcl 9.0.