I have this data frame. How can I find the 3 most repeated number in column b?
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})
I guess the answer should be 3,2,5 or 3,2,1
split
the column b
around the delimiter ,
, then use explode
to transform each element in list like to rows, finally use value_counts
+ head
to get the top 3 repeated elements:
df['b'].dropna().astype(str).str.split(',')\
.explode().value_counts().head(3).index.tolist()
explode
is available in pandas version >= 0.25
, for pandas version < 0.25
use:
pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()
['3', '2', '5']