Search code examples
pythonpython-dataclasses

How can I remove duplicates from a list of dataclass-objects which each have a list as a field?


I have this code:

from dataclasses import dataclass
from typing import List

@dataclass(eq=True, frozen=True)
class TestClass:
    field1: str
    field_list: List[str]

duplicate_list = [TestClass("foo", ["bar", "cat"]), TestClass("foo", ["bar", "cat"]), TestClass("foo", ["bar", "caz"])]

unique_list = remove_duplicates(duplicate_list)

def remove_duplicates(duplicate_list: List[TestClass]) -> List[TestClass]:
    return list(set(duplicate_list))

Now I want to check the list for duplicates. I tried to convert the list to a set like shown above. I also tried using

return list( dict.fromkeys(duplicate_list) )

Both approaches do not work as my class contains a list. Because of this the __hash__ function generated by the dataclass module does not work. It gives the error: unhashable type: 'list'

What would the correct approach be to remove the duplicate dataclass-elements? Would I need to write a custom __hash__ function? Or would it be possible to replace the list with some form of immutable list?


Solution

  • You can replace list with tuple (immutable list in python)

    from dataclasses import dataclass
    from typing import List, Tuple
    
    
    @dataclass(eq=True, frozen=True)
    class TestClass:
        field1: str
        field_list: Tuple[str, str]
    
    
    duplicate_list = [TestClass("foo", ("bar", "cat")), TestClass("foo", ("bar", "cat")), TestClass("foo", ("bar", "caz"))]
    

    Then your original remove_duplicates implementation will work correctly.

    def remove_duplicates(duplicate_list: List[TestClass]) -> List[TestClass]:
        return list(set(duplicate_list))