Search code examples
pythonpython-typingdeep-copypython-dataclasses

Modifying a dataclass object such that only specified elements are overridden


I would like to create a dataclass A with a lot of member elements. This dataclass should not have Optional members to ensure that the full information is available in the object.

Then I want to have a "modification option", which has the same members as A, but as optional members.

What would be the best way to do that without needing to write the members in two different classes?

This here is my approach (working example):

from copy import deepcopy
from dataclasses import dataclass
from typing import Optional

@dataclass
class A:
    x: int
    y: int


@dataclass
class A_ModificationOptions:
    x: Optional[int] = None
    y: Optional[int] = None


def modifyA(original: A, modification: A_ModificationOptions):
    if modification.x is not None:
        original.x = deepcopy(modification.x)

    if modification.y is not None:
        original.y = deepcopy(modification.y)


original_A = A(x=1, y=2)
print("A before modification: ", original_A) # A(x=1, y=2)

modification_A = A_ModificationOptions(y=7)
modifyA(original_A, modification_A)
print("A after modification: ", original_A) # A(x=1, y=7)

This code fulfills the following requirements:

  1. The original A has no optional members, so all must have been set.
  2. In the modification of A just the members that need to be adapted need to be set.

This code does not fulfill the following requirements:

  1. I don't want to "copy" each member of A into A_ModificationOptions again.
  2. If possible I don't want to have the modifyA() function but something inbuilt.
  3. If 2 is not possible: I don't want to add 2 lines per member of A into modifyA.

Is there a neat way to store sparse "Modification Options" for a potentially huge dataclass?

Usecase: A user creates once a full list and then in different scenarios he can play around with deltas to that full list and also the "delta" to the full list must be stored somehow -> So I thought about an original full list class A and a "delta" class A_ModificationOptions, but I hope that is somehow possible to do in a neater way. Maybe something like a smart deepcopy?

Update 1:

Thank you @wjandrea for your feedback! Your solution for point 3 did not consider more deeply nested dataclasses, so I used your suggestion to make it work for nested dataclasses. The code below now solves point 3:

from copy import deepcopy
from dataclasses import dataclass, is_dataclass
from typing import Optional


class Original:
    pass


@dataclass
class B(Original):
    a1: int
    a2: int
    a3: int


@dataclass
class A(Original):
    x: int
    y: int
    b: B


class Modification:
    pass


@dataclass
class B_Mod(Modification):
    a1: Optional[int] = None
    a2: Optional[int] = None
    a3: Optional[int] = None


@dataclass
class A_Mod(Modification):
    x: Optional[int] = None
    y: Optional[int] = None
    b: Optional[B_Mod] = None


def modifyDataclass(original: Original, modification: Modification):
    assert is_dataclass(original) and is_dataclass(modification)

    for k, v in vars(modification).items():
        if is_dataclass(v):
            assert isinstance(v, Modification)

            modifyDataclass(original=getattr(original, k), modification=v)

            return

        if v is not None:
            setattr(original, k, v)


original_A = A(x=1, y=2, b=B(a1=3, a2=4, a3=5))
print(
    "A before modification: ", original_A
)  # A(x=1, y=2, b=B(a1=3, a2=4, a3=5))

modification_A = A_Mod(y=7, b=B_Mod(a2=19))
modifyDataclass(original_A, modification_A)
print(
    "A after modification: ", original_A
)  # A(x=1, y=7, b=B(a1=3, a2=19, a3=5))

Now if there is a solution for point 1 and 2 that would be amazing!

Maybe also somehow with derivations? Like A_Mod being a child from A, but then switching all members to optional Members?


Solution

  • I think I understand what you want, and here's a way to dynamically generate the A_ModificationOptions class. As noted in the comments, this will never pass a static type checker. If you want to run something like mypy or pyright on this, you're going to have to Any out the modification options. This is very dynamic reflection in Python.

    Now, a couple of notes. dataclass is a decorator, and like any decorator, it's just being applied to a class after-the-fact. That is,

    @dataclass
    class X:
        ...
    

    is just

    class X:
        ...
    X = dataclass(X)
    

    So we can call dataclass like an ordinary Python function on a class we make up if we so choose. And while we're on the topic, we can make classes using ordinary Python too. type has a three-argument form which acts as a constructor for new classes.

     class type(name, bases, dict, **kwds)
    

    So let's see how we actually do that. We'll need dataclass and fields. I also import Optional to get a technically correct annotation, though it doesn't affect the semantics.

    from dataclasses import dataclass, fields
    from typing import Optional
    

    Now the magic sauce, commented for your convenience.

    def make_modification_dataclass(original_dataclass, new_class_name=None):
        # Provide a default name if the caller doesn't supply a custom
        # name for the new class.
        if new_class_name is None:
            new_class_name = original_dataclass.__name__ + "_ModificationOptions"
        # This actually creates the class. @dataclass is going to look at
        # the __annotations__ field on the class, which is normally
        # generated by writing type annotations in Python code. But it's
        # explicitly defined to be a mutable dictionary, so we're well
        # within our rights to create and mutate it ourselves.
        new_class = type(new_class_name, original_dataclass.__bases__, {
            "__annotations__": {}
        })
        # Iterate over all of the fields of the original dataclass.
        for field in fields(original_dataclass):
            # For each field, put a type in __annotations__. The type
            # could be anything as far as @dataclass is concerned, but we
            # make it Optional[whatever], which is actually the correct
            # type. No static type checker will ever see this, but other
            # tools that analyze __annotations__ at runtime will see a
            # correct type annotation.
            new_class.__annotations__[field.name] = Optional[field.type]
            # We also need to set the attribute itself on the class. This
            # is the "= None" part of the A_ModificationOptions class you
            # wrote, and it will show @dataclass what the default value of
            # the field should be.
            setattr(new_class, field.name, None)
        # Apply the decorator and return our brand new class.
        return dataclass(new_class)
    

    To use it, we just pass the original class and assign the result to a name.

    @dataclass
    class A:
        x: int
        y: int
    
    # This is making a class. A real, genuine dataclass.
    A_ModificationOptions = make_modification_dataclass(A)
    

    Your modifyA function is kind of close to dataclasses.replace, but the latter (a) takes a dictionary, and (b) returns a new instance rather than mutating in-place. Fortunately, it's fairly straightforward to write our own.

    This is basically what wjandrea suggested in the comments. I just prefer to use dataclasses.fields rather than vars, as it's guaranteed to get only dataclass fields and not anything extra from a non-dataclass superclass or from someone poking around and doing funny business.

    def modify(original, modification):
        for field in fields(modification):
            value = getattr(modification, field.name)
            if value is not None:
                setattr(original, field.name, value)
    

    And your code works as proposed.

    original_A = A(x=1, y=2)
    print("A before modification: ", original_A) # A(x=1, y=2)
    
    modification_A = A_ModificationOptions(y=7)
    modify(original_A, modification_A)
    print("A after modification: ", original_A) # A(x=1, y=7)
    

    I renamed the function modify instead of modifyA since it never actually does anything specific to A. This one function will work for any @dataclass and the corresponding _ModificationOptions class. No need to rewrite it, even superficially.

    Try it online!