I have the following data frame df
:
df = pd.DataFrame({'result' : ['s17h10e7', 's5e3h2S105h90e15',
's17H10e7S5e3H2s105h90e15'],
'status' : [102, 117, 205]})
result status
s17h10e7 102
s5e3h2S105h90e15 117
s17H10e7S5e3H2s105h90e15 205
I have a function named get_number_after_code
that reads a string and returns the SUM of any digits that immediately follow a user-defined code (e.g. a letter):
def get_number_after_code(string_to_read, code):
code_indices = [i for i, char in enumerate(string_to_read) if char == code]
joined_numbers = []
list_of_int_values = []
for idx in code_indices:
temp_number = []
for character in string_to_read[idx + 1: ]:
if not character.isdigit():
break
else:
temp_number.append(character)
joined_numbers = ''.join(temp_number)
list_of_int_values.append(int(joined_numbers))
return sum(list_of_int_values)
Examples:
get_number_after_code('s5e3h2s105h90e15', 'h')
>> 92
get_number_after_code('s5e3h2s105h90e15', 's')
>> 105
I would like to add a column named col_NEW
to the df
dataframe. This col_NEW
column would display the output of the get_number_after_code()
function as it is applied to the row element in the result
column. As an example, let's assume we use the code 'h' (but it could be either 's' or 'e'). The output would be:
result status col_NEW
s17h10e7 102 10
s5e3h2s105h90e15 117 92
s17h10e7s5e3h2s105h807e15 205 819
To do this, I'm using:
df['col_NEW'] = df.apply(get_number_after_code(df['result'], 'h'), axis=1)
I'm getting this not-so-helpful AssertionError
:
AssertionError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21060/915445793.py in <module>
----> 1 df['col_NEW'] = df.apply(count_tests_new(df['result'], 's'), axis=1)
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
8738 kwargs=kwargs,
8739 )
-> 8740 return op.apply()
8741
8742 def applymap(
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
686 return self.apply_raw()
687
--> 688 return self.apply_standard()
689
690 def agg(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
810
811 def apply_standard(self):
--> 812 results, res_index = self.apply_series_generator()
813
814 # wrap results
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
816
817 def apply_series_generator(self) -> tuple[ResType, Index]:
--> 818 assert callable(self.f)
819
820 series_gen = self.series_generator
AssertionError:
Am I using .apply()
syntactically correctly to add col_NEW
? If yes, does anyone know what is causing this AssertionError
?
You're invoking get_number_after_code
on each row, yet passing a Series object to it. Since it seems you only need the "result" column, use apply
on that column instead. Also, you can pass the letter (for example "h") as a positional argument. See docs:
df['col_NEW'] = df['result'].apply(get_number_after_code, args=('h',))
or by its keyword:
df['col_NEW'] = df['result'].apply(get_number_after_code, code='h')
Output:
result status col_NEW
0 s17h10e7 102 10
1 s5e3h2S105h90e15 117 92
2 s17H10e7S5e3H2s105h90e15 205 90