I have a dataframe "data" that looks like this:
f1 | f2 | f3 |
---|---|---|
11 | 34 | a |
14 | 10 | a |
20 | 12 | a |
15 | 19 | b |
19 | 29 | b |
29 | 30 | b |
I want to find the minimum value of f2 IF f3 is a. I don't want to find min value of f2 for f3 = a or f3 = b, just when f3 = a. And then I want to remove the observation that is associated with that min value in f2 in the dataframe. So I had this code:
a_part = data[data['f3'] == 'a'
min1 = a_part['f2'].min()
min1 = data['f2'] = min1
data_new_1 = pd.dataframe(data.loc[~min1])
Which works well. And now my dataframe looks like:
f1 | f2 | f3 |
---|---|---|
11 | 34 | a |
20 | 12 | a |
15 | 19 | b |
19 | 29 | b |
29 | 30 | b |
However, I want to remove the min value of f2 when f3 = a and the associated observation one by one by using a loop, and having a new dataframe each time. So essentially data_new_2 looks like:
f1 | f2 | f3 |
---|---|---|
11 | 34 | a |
15 | 19 | b |
19 | 29 | b |
29 | 30 | b |
Until there's only b left in f3. I tried to do a loop for it:
for i in range(1,6):
IN = data_new_i[['f3'] == 'a']
min1 = a_part['f2'].min()
min1 = data_new_i['f2'] == min1
vars()[data_new_i++] = pd.DataFrame(data.loc[~min1])
And this doesn't work. I'm very unfamiliar with the way Python treats new dataframe name with loop index. I think I have to use dict to put new dataframe in, but I don't know how I can extract a column of a dataframe from a dict, and how I can save the new dataframe into the dict. Can someone help me please?
In my solution ouput is list of DataFrames.
If there are always unique values in column f2
use loop by index values of sorted column and drop row by minimal value:
out = []
data1 = data.sort_values('f2')
for i in data1.loc[data1['f3'] == 'a', 'f2'].index:
data = data.drop(i)
out.append(data)
print (out)
[ f1 f2 f3
0 11 34 a
2 20 12 a
3 15 19 b
4 19 29 b
5 29 30 b, f1 f2 f3
0 11 34 a
3 15 19 b
4 19 29 b
5 29 30 b, f1 f2 f3
3 15 19 b
4 19 29 b
5 29 30 b]
If possible duplicates and need remove all dupes, like here in first loop all rows with f2=10
use:
print (data)
f1 f2 f3
0 11 10 a
1 14 10 a
2 20 12 a
3 15 19 b
4 19 29 b
5 29 30 b
out = []
data1 = data.sort_values('f2')
for i, g in data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']):
data = data.drop(g.index)
out.append(data)
print (out)
[ f1 f2 f3
2 20 12 a
3 15 19 b
4 19 29 b
5 29 30 b, f1 f2 f3
3 15 19 b
4 19 29 b
5 29 30 b]
It is not recommended, but possible create DataFrames by groups:
data1 = data.sort_values('f2')
for j, (i, g) in enumerate(data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']), 1):
data = data.drop(g.index)
globals()[f'data_new_{j}'] = data
print (data_new_1)
f1 f2 f3
2 20 12 a
3 15 19 b
4 19 29 b
5 29 30 b
print (data_new_2)
f1 f2 f3
3 15 19 b
4 19 29 b
5 29 30 b