Search code examples
pythonpicklepython-importbuilt-in

How to replace objects causing import errors with None during pickle load?


I have a pickled structure consisting of nested builtin primitives (list, dictionaries) and instances of classes that are not in the project anymore, that therefore cause errors during unpickling. I do not really care about those objects, I wish I could extract numerical values stored in this nested structure. Is there any way to unpickle from a file and replace everything that was broken due to import issues with, let's say, None?

I was trying to inherit from Unpickler and override find_class(self, module, name) to return Dummy if class can not be found, but for some reason I keep getting TypeError: 'NoneType' object is not callable in load reduce after that.

class Dummy(object):
    def __init__(self, *argv, **kwargs):
        pass

I tried something like

class RobustJoblibUnpickle(Unpickler):
    def find_class(self, _module, name):
        try:
            super(RobustJoblibUnpickle, self).find_class(_module, name)
        except ImportError:
            return Dummy

Solution

  • Maybe you can catch the exception in a try block, and do what you want (set some object to None use a Dummy class)?

    edit:

    Take a look at this, I don't know if it is the right way to do it, but it seems to work fine:

    import sys
    import pickle
    
    class Dummy:
        pass
    
    class MyUnpickler(pickle._Unpickler):
        def find_class(self, module, name): # from the pickle module code but with a try
            # Subclasses may override this. # we are doing it right now...
            try:
                if self.proto < 3 and self.fix_imports:
                    if (module, name) in _compat_pickle.NAME_MAPPING:
                        module, name = _compat_pickle.NAME_MAPPING[(module, name)]
                    elif module in _compat_pickle.IMPORT_MAPPING:
                        module = _compat_pickle.IMPORT_MAPPING[module]
                __import__(module, level=0)
                if self.proto >= 4:
                    return _getattribute(sys.modules[module], name)[0]
                else:
                    return getattr(sys.modules[module], name)
            except AttributeError:
                return Dummy
    
    # edit: as per Ben suggestion an even simpler subclass can be used
    # instead of the above
    
    class MyUnpickler2(pickle._Unpickler):
        def find_class(self, module, name):
            try:
                return super().find_class(module, name)
            except AttributeError:
                return Dummy
    
    class C:
        pass
    
    c1 = C()
    
    with open('data1.dat', 'wb') as f:
        pickle.dump(c1,f)
    
    del C # simulate the missing class
    
    with open('data1.dat', 'rb') as f:
        unpickler = MyUnpickler(f) # or MyUnpickler2(f)
        c1 = unpickler.load()
    
    print(c1) # got a Dummy object because of missing class