How can I remove duplicates from a list of dataclass-objects which each have a list as a field?

I have this code:

from dataclasses import dataclass
from typing import List

@dataclass(eq=True, frozen=True)
class TestClass:
    field1: str
    field_list: List[str]

duplicate_list = [TestClass("foo", ["bar", "cat"]), TestClass("foo", ["bar", "cat"]), TestClass("foo", ["bar", "caz"])]

unique_list = remove_duplicates(duplicate_list)

def remove_duplicates(duplicate_list: List[TestClass]) -> List[TestClass]:
    return list(set(duplicate_list))

Now I want to check the list for duplicates. I tried to convert the list to a set like shown above. I also tried using

return list( dict.fromkeys(duplicate_list) )

Both approaches do not work as my class contains a list. Because of this the __hash__ function generated by the dataclass module does not work. It gives the error: unhashable type: 'list'

What would the correct approach be to remove the duplicate dataclass-elements? Would I need to write a custom __hash__ function? Or would it be possible to replace the list with some form of immutable list?

Solution

You can replace list with tuple (immutable list in python)

from dataclasses import dataclass
from typing import List, Tuple


@dataclass(eq=True, frozen=True)
class TestClass:
    field1: str
    field_list: Tuple[str, str]


duplicate_list = [TestClass("foo", ("bar", "cat")), TestClass("foo", ("bar", "cat")), TestClass("foo", ("bar", "caz"))]

Then your original remove_duplicates implementation will work correctly.

def remove_duplicates(duplicate_list: List[TestClass]) -> List[TestClass]:
    return list(set(duplicate_list))