Search code examples
pythondefaultmutablenamedtuplepython-dataclasses

Are python's NamedTuple return structures one of the few places where mutable defaults should be used?


Methods to return structures from python functions have been discussed at length in various posts. Two good ones here and here.

However, unless I have missed it, none of the proposed solutions define the structure in the same place where its members are set, and instead either repeat the list of members on assignment (not DRY) or rely on position (error prone).

I am looking for a DRY way to do this both for writing speed and to avoid argument misalignment errors common when you repeat yourself.

The below code snippet shows three attempts to do this. For brevity's sake, the example's structure contains only one element, but the intention is obviously that the structures contain multiple elements.

The three methods are DRY, embedding the structure definition with the initialization of the returned instance.

Method 1 highlights the need for a better way but illustrates the DRY sought after syntax, where the structure and how it should be populated (decided at run time) are in the same place, namely the dict() call.

Method 2 uses typing.NamedTuple and seems to work. However it uses mutable defaults to do so

Method 3 follows method 2's approach, using dataclasses.dataclass rather than typing.NamedTuple. It fails because the former explicitly prohibits mutable defaults, raising ValueError: mutable default is not allowed

from collections import namedtuple
from dataclasses import dataclass
from typing import NamedTuple, List, Tuple

# Method 1
def ret_dict(foo_: float, bar_: float) -> Tuple:
    return_ = dict(foo_bar=[foo_, bar_])
    _ = namedtuple('_', return_.keys())
    return _(*return_.values())


# Method 2
def ret_nt(foo_: float, bar_: float) -> 'ReturnType':
    class ReturnType(NamedTuple):
        foo_bar: List[float] = [foo_, bar_]     # Mutable default value allowed
    return ReturnType()


# Method 3
def ret_dc(foo_: float, bar_: float) -> 'ReturnType':
    @dataclass
    class ReturnType:
        foo_bar: List[float] = [foo_, bar_]   # raises ValueError: mutable default is not allowed
    return ReturnType()


def main():
    rt1 = ret_dict(1, 0)
    rt1.foo_bar.append(3)
    rt2 = ret_dict(2, 0)
    print(rt1)
    print(rt2)

    rt1 = ret_nt(1, 0)
    rt1.foo_bar.append(3)   # amending the mutable default does not affect subsequent calls
    rt2 = ret_nt(2, 0)
    print(rt1)
    print(rt2)

    rt1 = ret_dc(1, 0)
    rt1.foo_bar.append(3)  # amending the default does not affect subsequent calls
    rt2 = ret_dc(2, 0)
    print(rt1)
    print(rt2)


if __name__ == "__main__":
    main()

The following questions arise:

Is method 2 a sensible pythonic approach?

One concern is that mutable defaults are somewhat of a taboo, certainly for function arguments. I wonder if their use here is OK however, given that the attached code suggests that these NamedTuple defaults (and perhaps the entire ReturnType definition) are evaluated on every function call, contrary to function argument defaults which it seems to me are evaluated only once and persist forever (hence the problem).

A further concern is that the dataclasses module seems to have gone out of its way to explicitly prohibit this usage. Was that decision overly dogmatic in this instance? or is warding against method 2 warranted?

Is this inefficient?

I would be happy if the syntax of Method 2 meant:

1 - Define ReturnType once on the first pass only

2 - call __init__() with the given (dynamically set) initialization on every pass

However, I am afraid that it may instead mean the following:

1 - Define ReturnType and its defaults on every pass

2 - call __init__() with the given (dynamically set) initialization on every pass

Should one be concerned about the inefficiency of re-defining chunky ReturnTypes on every pass when the call is in a "tight" loop? Isn't this inefficiency present whenever a class is defined inside a function? Should classes be defined inside functions?

Is there a (hopefully good) way to achieve DRY definition-instantiation using the new dataclasses module (python 3.7)?

Finally, is there a better DRY definition-instantiation syntax?


Solution

  • However, I am afraid that it may instead mean the following:

    1 - Define ReturnType and its defaults on every pass

    2 - call __init__() with the given (dynamically set) initialization on every pass

    That's what it means, and it costs a lot of time and space. Also, it makes your annotations invalid - the -> 'ReturnType' requires a ReturnType definition at module level. It also breaks pickling.

    Stick with a module-level ReturnType and don't use mutable defaults. Or, if all you want is member access by dot notation and you don't really care about making a meaningful type, just use types.SimpleNamespace:

    return types.SimpleNamespace(thing=whatever, other_thing=stuff)