Search code examples
pythonpandasdataframegroup-by

For each group in Pandas dataframe, return the most common value if it shows up more than `x%` of the time


Given a pandas dataframe, I would like to return a column's (string datatype) most common value for each groupby if this value shows up in more than n% of the rows, otherwise return 'NA'.


Solution

  • If need test number of most common values by count:

    N = 5
    def f(x):
        y = x.value_counts()
        return y.index[0] if y.iat[0] > N else np.nan
    
    
    df = df.groupby('g')['col'].agg(f).reset_index(name='new')
    

    Or by percentages:

    n = 50
    def f(x):
        y = x.value_counts(normalize=True) * 100
        return y.index[0] if y.iat[0] > n else np.nan
    
    
    df = df.groupby('g')['col'].agg(f).reset_index(name='new')