Search code examples
pythonpython-3.xsortingnamedtuple

Sort a list of namedtupled by the most frequent fields


What are some elegant and quick easy ways to sort a list of namedtuple by the most frequent elements in the list?

For example, we have this list

character_list = [
    Element(id=1, character='A'),
    Element(id=2, character='B'),
    Element(id=3, character='B'),
    Element(id=4, character='C'),
    Element(id=5, character='D'),
    Element(id=6, character='E'),
    Element(id=7, character='F'),
    Element(id=8, character='H'),
    Element(id=9, character='I'),
    Element(id=10, character='J'),
    Element(id=11, character='K'),
    Element(id=12, character='L'),
    Element(id=13, character='M'),
    Element(id=14, character='J'),
    Element(id=15, character='N'),
    Element(id=16, character='J')]

And sorted in like this?

character_list = [
    Element(id=10, character='J'),
    Element(id=14, character='J'),
    Element(id=16, character='J'),
    Element(id=2, character='B'),
    Element(id=3, character='B'),
    Element(id=1, character='A'),
    Element(id=4, character='C'),
    Element(id=5, character='D'),
    Element(id=6, character='E'),
    Element(id=7, character='F'),
    Element(id=8, character='H'),
    Element(id=9, character='I'),
    Element(id=11, character='K'),
    Element(id=12, character='L'),
    Element(id=13, character='M'),
    Element(id=14, character='J'),
    Element(id=15, character='N')]

Try this, but doesn't seem to have the results that I am looking for

sorted(character_list, key=lambda x: character_list.count(x.character))

Solution

  • x.character is never in your list. In any case, using list.count like this is highly inefficient. Sorting is O(N*log N), however, if your key-function uses list.count, it will make everything deteriorate to O(N**2).

    Instead, build a dictionary of counts, and use that dictionary, this will maintain your O(N*log N) performance. So given:

    >>> from pprint import pprint
    >>> pprint(character_list)
    [Element(id=1, character='A'),
     Element(id=2, character='B'),
     Element(id=3, character='B'),
     Element(id=4, character='C'),
     Element(id=5, character='D'),
     Element(id=6, character='E'),
     Element(id=7, character='F'),
     Element(id=8, character='H'),
     Element(id=9, character='I'),
     Element(id=10, character='J'),
     Element(id=11, character='K'),
     Element(id=12, character='L'),
     Element(id=13, character='M'),
     Element(id=14, character='J'),
     Element(id=15, character='N'),
     Element(id=16, character='J')]
    

    Then

    >>> from collections import Counter
    >>> counts = Counter(e.character for e in character_list)
    >>> counts
    Counter({'J': 3, 'B': 2, 'A': 1, 'C': 1, 'D': 1, 'E': 1, 'F': 1, 'H': 1, 'I': 1, 'K': 1, 'L': 1, 'M': 1, 'N': 1})
    

    Finally,

    >>> def keyfunc(e):
    ...     return counts[e.character]
    ...
    >>> sorted_character_list = sorted(character_list, key=keyfunc, reverse=True)
    >>> pprint(sorted_character_list)
    [Element(id=10, character='J'),
     Element(id=14, character='J'),
     Element(id=16, character='J'),
     Element(id=2, character='B'),
     Element(id=3, character='B'),
     Element(id=1, character='A'),
     Element(id=4, character='C'),
     Element(id=5, character='D'),
     Element(id=6, character='E'),
     Element(id=7, character='F'),
     Element(id=8, character='H'),
     Element(id=9, character='I'),
     Element(id=11, character='K'),
     Element(id=12, character='L'),
     Element(id=13, character='M'),
     Element(id=15, character='N')]