Search code examples
pythonpandasnumpyfrequency

how can I find the frequency?


I have this data frame. How can I find the 3 most repeated number in column b?

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})

I guess the answer should be 3,2,5 or 3,2,1


Solution

  • split the column b around the delimiter ,, then use explode to transform each element in list like to rows, finally use value_counts + head to get the top 3 repeated elements:

    df['b'].dropna().astype(str).str.split(',')\
           .explode().value_counts().head(3).index.tolist()
    

    explode is available in pandas version >= 0.25, for pandas version < 0.25 use:

    pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()
    

    ['3', '2', '5']