I am creating a new dummy variable based off of a given column and a criteria. Below is the code I am working with. It works but is too slow for what I would like to do. Is there a faster, maybe vectorized way do create dummies in pandas? Specifically, according to my example?
I have looked up the get_dummies function in pandas but it seems to do something a little different than what I am doing here. I could be wrong though so if anyone has a way to make get_dummies work with this example, that would be an acceptable answer too.
def flagger(row, criteria, col):
if row[col] <= criteria:
return 1
if row[col] > criteria:
return 0
dstk['dropflag'] = dstk.apply(lambda row: flagger(row, criteria, col), axis=1)
Edit: There are two good answers here. At a glance they both look equally fast (at least to the same order of magnitude) so I just accepted one. If anyone wants to do some more serious profiling I would be happy to revise my answer choice.
Why not try np.where
. It's column-wise vectorized operation and it is much faster than row-wise apply.
dstk['dropflag'] = np.where(dstk.col <= criteria, 1, 0)