Search code examples
pythonclasscoding-style

What is a common way to return the result of method call in Python class?


I have a class in Python which trains a model for the given data:

class Model(object):
    def __init__(self, data):
        self.data = data
        self.result = None

    def train(self):
        ... some codes for training the model ...
        self.result = ...

Once I create a Model object,

myModel = Model(myData)

the model is not trained. Then I can call the train method to initiate the training:

myModel.train()

Then myModel.result will be updated in-place.

Also, I could re-write the train method as:

def train(self):
    ... some code for training the model ...
    result = ...
    # avoid update in-place
    trainedModel = copy.copy(self)
    trainedModel.result = result
    return trainedModel

In this way, by calling myTrainedModel = myModel.train() I have a new object and the state of the original myModel is not changed.

My question is: Which is a more common way to store the returned result from a method in a class?


Solution

  • My question is: Which is a more common way to store the returned result from a method in a class?

    It's really hard to say here. Your example narrows it down to a very specific use case, and even if it was broader, an answer perfectly devoid of subjectivity would probably be impossible to find.

    Nevertheless, I might be able to provide some info that could help you guide your decisions.

    Pure Functions

    Pure functions are functions that trigger no side-effects. They don't modify any state outside of the function. They're generally known to be some of the easiest types of functions to use correctly, since side effects are a common tripping point in development ("Which part of this system caused this state to change to this?") A function that has zero side effects offers little to trip over.

    Your second version is a pure function. It has no side effects: it returns a newly-trained Model. It doesn't affect anything that already exists.

    Pure functions are also inherently thread-safe. Since they modify no shared state, they're extremely friendly to a concurrent paradigm.

    Side Effects

    Nevertheless, a function that triggers side effects is often a practical necessity in many programs. From a single-threaded efficiency perspective, any function that faces the choice between modifying a complex state or returning a whole new one can be significantly bottlenecked by doing the latter.

    Imagine, as a gross case, a function that plots one pixel on an image returning a whole new image with a pixel plotted to it instead of modifying the one you pass in. That would tend to immediately become a significant bottleneck. On the other hand, if the result we're returning is not complex (ex: just an integer or very simple aggregate), often the pure function is even faster.

    So functions that trigger a side effect (ideally only one logical side effect to avoid becoming a confusing source of bugs) are often a practical necessity in some cases when the results are complex and expensive to create.

    Pure or "Impure"

    So the choice here kind of boils down to a pure function or "impure" function that has one side effect. Since we're dealing with an object-oriented scenario, another way to look at this is mutability vs. immutability (which often has similar differences to pure and "impure" functions). We can train a Model or create and return a trained Model without touching an existing one.

    The choice of which might be "better" would depend on what you are after. If safety and maintainability are your goals, the pure version might help a bit more. If the cost of creating and returning a new model is expensive and efficiency is your primary goal, training an existing model might help you avoid a bottleneck.

    If in doubt, I would suggest the pure version generally. Qualities like safety and maintainability that improve productivity tend to come first before worrying about performance. Later you can grab a profiler and drill down to your hotspots, and if you find that returning whole new trained models is a bottleneck, you could, say, add a new method to train a model in-place which you use for your most critical code paths.