Search code examples
pythonunique

How to make values in list of dictionary unique?


I have a list of dictionaries in Python, which looks like following:

d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...]

What I want to achieve is that to keep the feature_a, _b and _c unique.

For example, if we have 3 entries which have the same feature_a and _b, but have 3 different values of feature_c 100, 100, 150, then after the operation, it should be 100 and 150.

How can I achieve this?

================================================================ UPDATE:

OK, Thanks for Anand's excellent answer, it works perfectly. However, I have a further question.

Suppose we have a new feature_d and the dictionary looks like:

d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...]

and I only want to deduplicate feature_a, _b and _c, but leave feature_d out. How can I achieve this?

Many thanks.


Solution

  • If the order of the initial d list is not important , you can take the .items() of each dictionary and convert it into a frozenset() , which is hashable, and then you can convert the whole thing to a set() or frozenset() , and then convert each frozenset() back to dictionary. Example -

    uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
    

    sets() do not allow duplicate elements. Though you would end up losing the order of the list. For Python 2.x , the list(...) is not needed, as map() returns a list.


    Example/Demo -

    >>> import pprint
    >>> pprint.pprint(d)
    [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
     {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150},
     {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
     {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
     {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}]
    >>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
    >>> pprint.pprint(uniq_d)
    [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
     {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150},
     {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
     {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}]
    

    For the new requirement -

    However, what if that I have another feature_d but I only want to dedup feature_a, _b and _c

    If two entries which have same feature_a, _b and _c, they are considered the same and duplicated, no matter what is in feature_d

    A simple way to do this is to use a set and a new list, add only the features you need to the set, and check using only the features you need. Example -

    seen_set = set()
    new_d = []
    for i in d:
        if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
            new_d.append(i)
            seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
    

    Example/Demo -

    >>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'},
    ...  {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'},
    ...  {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'},
    ...  {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}]
    >>> seen_set = set()
    >>> new_d = []
    >>> for i in d:
    ...     if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
    ...         new_d.append(i)
    ...         seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
    ...
    >>> pprint.pprint(new_d)
    [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'},
     {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'},
     {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}]