Search code examples
pythonpickledill

What is the proper way to make an object with unpickable fields pickable?


For me what I do is detect what is unpickable and make it into a string (I guess I could have deleted it too but then it will falsely tell me that field didn't exist but I'd rather have it exist but be a string). But I wanted to know if there was a less hacky more official way to do this.

Current code I use:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        setattr(pickable_args, field, field_val)
    return pickable_args

Context: I think I do it mostly to remove the annoying tensorboard object I carry around (but I don't think I will need the .tb field anymore thanks to wandb/weights and biases). Not that this matters a lot but context is always nice.

Related:


Edit:

Since I decided to move away from dill - since sometimes it cannot recover classes/objects (probably because it cannot save their code or something) - I decided to only use pickle (which seems to be the recommended way to be done in PyTorch).

So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

Is this the best:

def is_picklable(obj):
  try:
    pickle.dumps(obj)

  except pickle.PicklingError:
    return False
  return True

thus current soln:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        # - if current field value is not pickable, make it pickable by casting to string
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        elif not is_picklable(field_val):
            field_val: str = str(field_val)
        # - after this line the invariant is that it should be pickable, so set it in the new args obj
        setattr(pickable_args, field, field_val)
    return pickable_args


def make_opts_pickable(opts):
    """ Makes a namespace pickable """
    return make_args_pickable(opts)


def is_picklable(obj: Any) -> bool:
    """
    Checks if somehting is pickable.

    Ref:
        - https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    import pickle
    try:
        pickle.dumps(obj)
    except pickle.PicklingError:
        return False
    return True

Note: one of the reasons I want something "offical"/tested is because I am getting pycharm halt on the try catch: How to stop PyCharm's break/stop/halt feature on handled exceptions (i.e. only break on python unhandled exceptions)? which is not what I want...I want it to only halt on unhandled exceptions.


Solution

  • What is the proper way to make an object with unpickable fields pickable?

    I believe the answer to this belongs in the question you linked -- Python - How can I make this un-pickleable object pickleable?. I've added a new answer to that question explaining how you can make an unpicklable object picklable the proper way, without using __reduce__.

    So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

    Objects that are picklable are defined in the docs as follows:

    • None, True, and False
    • integers, floating point numbers, complex numbers
    • strings, bytes, bytearrays
    • tuples, lists, sets, and dictionaries containing only picklable objects
    • functions defined at the top level of a module (using def, not lambda)
    • built-in functions defined at the top level of a module
    • classes that are defined at the top level of a module
    • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

    The tricky parts are (1) knowing how functions/classes are defined (you can probably use the inspect module for that) and (2) recursing through objects, checking against the rules above.

    There are a lot of caveats to this, such as the pickle protocol versions, whether the object is an extension type (defined in a C extension like numpy, for example) or an instance of a 'user-defined' class. Usage of __slots__ can also impact whether an object is picklable or not (since __slots__ means there's no __dict__), but can be pickled with __getstate__. Some objects may also be registered with a custom function for pickling. So, you'd need to know if that has happened as well.

    Technically, you can implement a function to check for all of this in Python, but it will be quite slow by comparison. The easiest (and probably most performant, as pickle is implemented in C) way to do this is to simply attempt to pickle the object you want to check.

    I tested this with PyCharm pickling all kinds of things... it doesn't halt with this method. The key is that you must anticipate pretty much any kind of exception (see footnote 3 in the docs). The warnings are optional, they're mostly explanatory for the context of this question.

    def is_picklable(obj: Any) -> bool:
        try:
            pickle.dumps(obj)
            return True
        except (pickle.PicklingError, pickle.PickleError, AttributeError, ImportError):
            # https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
            return False
        except RecursionError:
            warnings.warn(
                f"Could not determine if object of type {type(obj)!r} is picklable"
                "due to a RecursionError that was supressed. "
                "Setting a higher recursion limit MAY allow this object to be pickled"
            )
            return False
        except Exception as e:
            # https://docs.python.org/3/library/pickle.html#id9
            warnings.warn(
                f"An error occurred while attempting to pickle"
                f"object of type {type(obj)!r}. Assuming it's unpicklable. The exception was {e}"
            )
            return False
    

    Using the example from my other answer I linked above, you could make your object picklable by implementing __getstate__ and __setstate__ (or subclassing and adding them, or making a wrapper class) adapting your make_args_pickable...

    class Unpicklable:
        """
        A simple marker class so we can distinguish when a deserialized object
        is a string because it was originally unpicklable 
        (and not simply a string to begin with)
        """
        def __init__(self, obj_str: str):
            self.obj_str = obj_str
    
        def __str__(self):
            return self.obj_str
    
        def __repr__(self):
            return f'Unpicklable(obj_str={self.obj_str!r})'
    
    
    class PicklableNamespace(Namespace):
        def __getstate__(self):
            """For serialization"""
    
            # always make a copy so you don't accidentally modify state
            state = self.__dict__.copy()
    
            # Any unpicklables will be converted to a ``Unpicklable`` object 
            # with its str format stored in the object
            for key, val in state.items():
                if not is_picklable(val):
                    state[key] = Unpicklable(str(val))
            return state
        def __setstate__(self, state):
            self.__dict__.update(state)  # or leave unimplemented
    

    In action, I'll pickle a namespace whose attributes contain a file handle (normally not picklable) and then load the pickle data.

    # Normally file handles are not picklable
    p = PicklableNamespace(f=open('test.txt'))
    
    data = pickle.dumps(p)
    del p
    
    loaded_p = pickle.loads(data)
    # PicklableNamespace(f=Unpicklable(obj_str="<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"))