Python Class: overwrite `self`

In my python script I have a global storage (a simple global dict) that stores Processo objects. It get's filled during the execution of my program. It exists to avoid creating repeated Processo objects, due performance reasons.

So, for the class Processo I want verify during its creation if it is already on the global storage.

In that case I just want to copy it to self. I am using getfromStorage() for that.

class Processo:
    def __init__(self, name, ...): # ... for simplicity
       self.processoname = name
       self = getfromStorage(self)

Don't know if it's useful but ...

def getfromStorage(processo):
    if processo.processoname in process_storage:
        return process_storage[processo.processoname]
    return processo

How do I achieve that? Am I missing something or my design is wrong?

Solution

This pattern can't be accomplished reasonably with __init__, because __init__ only initializes an already existing object, and you can't change what the caller will get (you can rebind self, but that just cuts you off from the object being created, the caller has their own separate alias that is unaffected).

The correct way to do this is to override the actual constructor, __new__, which allows you to return the new instance, which you may or may not create:

class Processo:
    def __new__(cls, name, ...): # ... for simplicity
       try:
           # Try to return existing instance from storage
           return getfromStorage(name)
       except KeyError:
           pass

       # No instance existed, so create new object
       self = super().__new__(cls)  # Calls parent __new__ to make empty object

       # Assign attributes as normal
       self.processoname = name

       # Optionally insert into storage here, e.g. with:
       self = process_storage.setdefault(name, self)
       # which will (at least for name of built-in type) atomically get either then newly
       # constructed self, or an instance that was inserted by another thread
       # between your original test and now
       # If you're not on CPython, or name is a user-defined type where __hash__
       # is implemented in Python and could allow the GIL to swap, then use a lock
       # around this line, e.g. with process_storage_lock: to guarantee no races

       # Return newly constructed object
       return self

To reduce overhead, I mildly rewrote getfromStorage, so it just takes the name and performs lookup, allowing the exception to bubble if it fails:

def getfromStorage(processoname):
    return process_storage[processoname]

which means that, when a cached instance can be used, no unnecessary self object need be freshly constructed.

Note: If you do this, it's usually a good idea not to define __init__ at all; the construction of an object is done by calling the class's __new__, then implicitly calling __init__ on the result. For cached instances, you wouldn't want them reinitialized, so you want an empty __init__ (so the cached instance isn't modified by virtue of being retrieved from the cache). Put all __init__-like behavior in the code that constructs and returns a new object inside __new__, and only execute it for new objects, to avoid this problem.