Given a pandas dataframe, I would like to return a column's (string datatype) most common value for each groupby if this value shows up in more than n%
of the rows, otherwise return 'NA'.
If need test number of most common values by count:
N = 5
def f(x):
y = x.value_counts()
return y.index[0] if y.iat[0] > N else np.nan
df = df.groupby('g')['col'].agg(f).reset_index(name='new')
Or by percentages:
n = 50
def f(x):
y = x.value_counts(normalize=True) * 100
return y.index[0] if y.iat[0] > n else np.nan
df = df.groupby('g')['col'].agg(f).reset_index(name='new')