Search code examples
pythonpandasdataframegroup-by

pandas how to get all rows with specific count of values


I have a dataframe

df = 

    C1 C2
    a.  2
    d.  8  
    d.  5  
    d.  5  
    b.  3
    b.  4
    c.  5
    a.  6
    b.  7

I want to take all the rows, in which the count of the value in C1 is <= 2, and add a new col that is low, and keep the original value otherwise. So the new df will look like that:

df_new = 
C1 C2 type
a.  2  low
d.  8  d
d.  5  d
d.  5  d
b.  3. b
b.  4  b
c.  5. low
a.  6. low
b.  7  b

How can I do this?

I also want to get back a list of all the types that were low (['a','c'] here)

Thanks


Solution

  • You can use pandas.DataFrame.groupby and count the value of 'C1' in each group. Then use lambda in pandas.DataFrame.transform and return low or the original value of the group. Or we can use numpy.where on the result of groupby.

    df['type'] = df.groupby('C1')['C1'].transform(lambda g: 'low' if len(g)<=2 else g.iloc[0][:-1])
    
    # Or we can use 'numpy.where' on the result of groupby
    g = df.groupby('C1')['C1'].transform('size')
    df['type'] = np.where(g<=2, 'low', df['C1'].str[:-1])
    print(df)
    

    Output:

       C1  C2 type
    0  a.   2  low
    1  d.   8    d
    2  d.   5    d
    3  d.   5    d
    4  b.   3    b
    5  b.   4    b
    6  c.   5  low
    7  a.   6  low
    8  b.   7    b