Search code examples
pythonmetaprogrammingrollbackmagic-methodscopy-on-write

Object roll-back, copy-on-write, versioned proxy etc in Python


Premise: Given a Python object obj, I want to pass it along to some random function, and, when the function is done, I need the option to reset obj to it's original state. Additionally, no actual changes can be made to the obj as other code may still want to access it's original state.


The optimal solution should be quick in the common case where a large obj is only slightly modified. Performance for the uncommon case where an obj needs to be rolled back is less important.

Those requirements are orthogonal to the brute force solution of simply copying the object: It would be ridiculously slow in the common case, and super fast for the uncommon roll-back.

The solution should generally allow the code working on the object to treat it as a normal object. This includes assigning all sorts of attributes to it, including custom classes. Obviously, the solution needs to take into consideration the entire object tree. Some concessions may be needed. Examples of restrictions I've considered in my solutions so far include requiring non-basic types to all inherit from a special base class, disallowing dicts and lists in exchange for tuples and a custom dict class etc. Major arcana may be acceptable.

I've been working on this for a while, and would love to see if what ideas and suggestions more experienced Python wizards may have.


Edit: Fred's answer made me realize a missing requirement: No changes can be made to the original obj, as the original state is also valuable.


Solution

  • I've actually implemented two solutions to this question by now, and seeing as there are no other answers, I might as well share one.

    The easiest solution is to use Copy On Demand. If we have a proxy P targeting object O: P will have a __getattr__ method so that when P.x is attempted, it tries to copy from O.x, storing it on P.x at the same time. This has the effect that future access of P.x will never reach __getattr__, and modifications to P.x will not affect the original.

    There are a bunch of implementation details:

  • Maintaining a list of attributes that are deleted from P; if P is merged with O, the deleted attributes must be deleted from O.
  • Writing custom deep copying routines for any supported datatypes, such as dict, list, etc - making sure to replace all objects O with a proxy P in the copied dict, list, etc.
  • Writing ProxyDict, ProxyList etc if desired.
  • Making sure proxy chains, that is proxy to a proxy, works. This basically means to avoid side effects when a proxy to a proxy needs to see if an attribute exists.
  • Implementing methods for merging the proxy downwards into the proxied object, and splitting it entirely, copying in remaining data from the proxied object.

    Even so, compared to the complexity of the effect, it is a very easy to understand solution: The proxy simply copies any data that is accessed.