Search code examples
pythonpandassplit-apply-combine

Get most common column for each column value


I want the most common letter for each number. I've tried a variety of things; not sure what's the right way.

import pandas as pd
from pandas import DataFrame, Series

original = DataFrame({
    'letter': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'}, 
    'number': {0: '01', 1: '01', 2: '02', 3: '02', 4: '02'}
})

expected = DataFrame({'most_common_letter': {'01': 'A', '02': 'B'}})

Ideally I'm looking to maximize readability.


Solution

  • We can use DataFrame.mode() method:

    In [43]: df.groupby('number')[['letter']] \
               .apply(lambda x: x.mode()) \
               .reset_index(level=1, drop=True)
    Out[43]:
           letter
    number
    01          A
    02          B