I have a column that has a number value (15, ..., 28, etc) called 'big', but depending on this number it should sum the columns with 5 previous numbers columns names, I mean something like...
big | c15 | c16 | c17 | ... | c27 | c28 |
---|---|---|---|---|---|---|
23 | 1 | 0 | 1 | ... | 1 | 0 |
21 | 1 | 1 | 0 | ... | 1 | 1 |
... | 0 | 0 | 1 | ... | 1 | 0 |
25 | 1 | 0 | 1 | ... | 1 | 1 |
So, depending on the "big" column, for example, 25, my new column should sum 'c24'+'c23'+'c22'+'c21'+'c20' and the result must be calculated in the new column name.
I have tried several movements but it doesn't works. I show my code below:
def test_fun(df):
if (df['big'] > 19).all():
pc = []
for i in range(1,6):
x = 'c' + (df['big'] - i).apply(str)
pc.append(x)
y = df[pc].sum(axis = 1)
return y
elif (df['big'] == 19).all():
pc = []
for i in range(1,5):
x = 'c' + (df['big'] - i).apply(str)
pc.append(x)
y = df[pc].sum(axis = 1)
return y
elif (df['big'] == 18).all():
pc = []
for i in range(1,4):
x = 'c' + (df['big'] - i).apply(str)
pc.append(x)
y = df[pc].sum(axis = 1)
return y
else:
pc = []
for i in range(1,3):
x = 'c' + (df['big'] - i).apply(str)
pc.append(x)
y = df[pc].sum(axis = 1)
return y
df['new_column'] = df.apply(lambda row: test_fun(df), axis = 1)
I added several conditions due to actually my table is beginning from c15 to c28 column, but it will be increasing during time.
Finally, when I use the function df.apply()
to apply my function by row I had been having several errors during my trials. Some of them like:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
that's what I added .all()
in my if, elif, else
conditions. Even...
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index 'c27', 'c27', ...,\n ('c26', 'c26',... dtype='object')] are in the [columns]"
Do you know what I should be probably doing wrong?
One way using pandas.DataFrame.apply
:
def get_big(series):
n = series["big"]
indices = ["c%s" % i for i in range(n-1, n-6, -1)]
indices = series.index.intersection(indices)
return series[indices].sum()
df.apply(get_big, axis=1)
Sample data
c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 big
0 0 1 1 0 1 0 1 1 1 0 21
1 1 0 1 0 0 0 1 0 1 0 28
2 1 1 0 1 0 1 0 1 0 0 20
3 0 0 0 0 1 0 0 1 0 1 20
4 1 1 0 1 0 0 0 0 0 0 23
5 1 0 0 1 0 0 0 1 0 0 25
6 0 1 0 0 1 1 1 0 1 0 23
7 1 0 1 0 0 0 0 1 0 1 20
8 1 0 1 0 1 1 0 0 0 1 26
9 0 0 0 1 1 0 1 1 0 1 25
Output:
0 0
1 1
2 0
3 0
4 2
5 2
6 1
7 0
8 3
9 2
dtype: int64