Search code examples
pythonpandascounter

Count sublists with the same values in a list


How to count sublists with the same values (order doesn't matter) in a list?

I tried this:

from collections import Counter

Input = [
    [
        'Test123', 'heyhey123', 'another_unique_value',
    ],
    [
        'Test123', 'heyhey123', 'another_unique_value',
    ],
    [
        'heyhey123',
    ],
    [
        'Test123', 'heyhey123',
    ],
    [
        'another_unique_value', 'heyhey123', 'Test123'
    ]
]

Counter(str(e) for e in li)

Output:

Counter({
    "['Test123', 'heyhey123', 'another_unique_value']": 2},
    "['heyhey123']": 1},
    "['Test123', 'heyhey123']": 1},
    "['another_unique_value', 'heyhey123', 'Test123']": 1},
)

Obviously it takes the order from the values in the list in account. How do I count the sublists where the order doesn't matter?

The output I want is:

Counter({
    "['Test123', 'heyhey123', 'another_unique_value']": 3},
    "['heyhey123']": 1},
    "['Test123', 'heyhey123']": 1},
)

Solution

  • You can replace

    Counter(str(e) for e in li)
    

    with

    Counter(tuple(sorted(e)) for e in li)
    

    Giving output:

    Counter({('Test123', 'another_unique_value', 'heyhey123'): 3,
             ('heyhey123',): 1,
             ('Test123', 'heyhey123'): 1})
    

    Another option would be to use set(e) to ignore the order of elements in the list, but this has the downside of ignoring repetitions - ['Test123', 'heyhey123', 'another_unique_value'] would be counted as the same as ['Test123', 'heyhey123', 'another_unique_value', 'another_unique_value'] - and in addition, when converting from the unhashable set to include in a Counter, an identical order is not guaranteed.