python serialization profiling pickle dill

How to interpret Dill's pickling trace output in Python? (Profiling (un)pickling / (de)serialization bottlenecks)

I'm trying to use dills' dill.detect.trace(True) functionality - which prints a trace during pickling - to find out why my unpickling process is so slow. I haven't found a way to output the trace for unpickling, but I am assuming this is more or less the 1-to-1 reverse process of pickling, and that inspecting the pickling trace should thus reveal bottlenecks which also hold for unpickling.

I don't know how to interpret its output, however. For example:

T4: <class 'foo_package.base.model.BarModel'>
# T4
D2: <dict object at 0x7f7de3832800>
T4: <class 'pathlib.PosixPath'>
# T4
T4: <class 'foo_package.some_module.Bar'>
# T4
D2: <dict object at 0x7f7d206273c0>
T4: <class 'some_package.some_module.Bar.options.SessionOptions'>
# T4
D2: <dict object at 0x7f7d20384040>
T4: <enum 'ModelType'>
# T4
T4: <enum 'SomeOtherEnum'>
# T4
# D2
D2: <dict object at 0x7f7d206b9ac0>
# D2
T4: <class 'some_package.some_other_module.XyzzyObject'>
# T4
D2: <dict object at 0x7f7d206cb100>
###... and so on...

How should one interpret D2 and T4? And what do the # prefixes mean? Can hints be found on the existence of cyclical references/deeply nested structures which increase (de)serialization time?

Dills documentation on the trace function merely states:

trace(boolean)

print a trace through the stack when pickling; useful for debugging

Further context: (lowering risk of XY problem)

I'm using dill to store a partial state of a python program's instances in order to save an analytical model.

(Un)pickling has become very slow over time, and I'm trying to identify the cause.

I have already delegated storage of objects with dedicated serialization methods (e.g. numpy/pyarrow/pandas objects) to other serialization methods using __getstate__ and __setstate__ for objects containing significant amounts of data. This helped a little, but it still takes 1 to 2 minutes to deserialize, which slows down the debugging process quite a lot.

Solution

I'm the dill author. This is copied from the README on GitHub and PyPI...

To aid in debugging pickling issues, use dill.detect which provides tools like pickle tracing::

>>> import dill.detect
>>> dill.detect.trace(True)
>>> f = dumps(squared)
F1: <function <lambda> at 0x108899e18>
F2: <function _create_function at 0x108db7488>
# F2
Co: <code object <lambda> at 0x10866a270, file "<stdin>", line 1>
F2: <function _create_code at 0x108db7510>
# F2
# Co
D1: <dict object at 0x10862b3f0>
# D1
D2: <dict object at 0x108e42ee8>
# D2
# F1
>>> dill.detect.trace(False)

With trace, we see how dill stored the lambda (F1) by first storing _create_function, the underlying code object (Co) and _create_code (which is used to handle code objects), then we handle the reference to the global dict (D2). A # marks when the object is actually stored.