I am trying to find if values in one column (Column_A) are contained in another column (Column_B) that have values that are separated by # signs. Then I would like to return the matching value in a new column (Column_C).
I have tried this:
def find_matching_value(row):
a_values=[float(x) for x in row[my_df1.Column_B].split('#') if x !='']
if row[my_df1.Column_A] in a_values:
return row['Column_A']
else:
return None
my_df2['Column_C']=my_df1.apply(find_matching_value, axis=1)
However, I get a key error:
KeyError: "None of [Index(['#16.990000#16.990000#16.990000#28.990000#28.990000#16.99#16.99#16.99#21.99#21.99#21.99,\n'#13.490000#13.490000#13.990000#11.99#11.99#12.49', \n'#13.490000#13.390000#14.490000#16.990000#16.990000#17.490000#12.49#12.49#12.99#15.99#15.99#15.99'dtype='object', length=3)] are in the [index]"
Not sure what is causing the issue here.
There is no need to use apply
here, you can slighty ajudst your listcomp :
my_df1["Column_C"] = [a if any([b == str(a) for b in lst]) else None
for a, lst in zip(my_df1["Column_A"],
my_df1["Column_B"].str.split("#"))]
Output :
print(my_df1)
Column_A Column_B Column_C
0 16.99 #16.990000#16.990000#16.990000#28.990000#28.990000#16.99#16.99#16.99#21.99#21.99#21.99 16.99
1 11.99 #13.49#13.49#13.99#11.99#11.99#12.49 11.99
2 12.49 #13.49#13.39#14.49#16.99#16.99#17.49#12.49#12.49#12.99#15.99#15.99#15.99 12.49