I have a dataframe
df =
C1 C2
a. 2
d. 8
d. 5
d. 5
b. 3
b. 4
c. 5
a. 6
b. 7
I want to take all the rows, in which the count of the value in C1 is <= 2, and add a new col that is low, and keep the original value otherwise. So the new df will look like that:
df_new =
C1 C2 type
a. 2 low
d. 8 d
d. 5 d
d. 5 d
b. 3. b
b. 4 b
c. 5. low
a. 6. low
b. 7 b
How can I do this?
I also want to get back a list of all the types that were low (['a','c'] here)
Thanks
You can use pandas.DataFrame.groupby
and count the value of 'C1'
in each group. Then use lambda
in pandas.DataFrame.transform
and return low
or the original value of the group. Or we can use numpy.where
on the result of groupby
.
df['type'] = df.groupby('C1')['C1'].transform(lambda g: 'low' if len(g)<=2 else g.iloc[0][:-1])
# Or we can use 'numpy.where' on the result of groupby
g = df.groupby('C1')['C1'].transform('size')
df['type'] = np.where(g<=2, 'low', df['C1'].str[:-1])
print(df)
Output:
C1 C2 type
0 a. 2 low
1 d. 8 d
2 d. 5 d
3 d. 5 d
4 b. 3 b
5 b. 4 b
6 c. 5 low
7 a. 6 low
8 b. 7 b