My data looks like this:
I'm using the following script to populate the RP8_Recruise as either "Y" (NEAR_DIST< 100 meters) or "N" (NEAR_DIST> 100 meters).
nrows = plots_dist_joined.shape[0]
for i in range(0, nrows):
# for plots that are within wanted distance from disturbance harvest
if (plots_dist_joined.iloc[i,9] < 100) | (plots_dist_joined.iloc[i,9] == 100):
plots_dist_joined["RP_"+reporting_period+"Recruise"] = "Y"
plots_dist_joined["RP_"+reporting_period+"RecrType"] = "PD"
# for plots that are NOT within wanted distance from disturbance harvest
else:
plots_dist_joined["RP_"+reporting_period+"Recruise"] = "N"
plots_dist_joined["RP_"+reporting_period+"RecrType"] = np.nan
This populates the entire RP_8Recruise column as "N" even though there are distances that are under 100 meters (IDs = 59197, 40, 84, 92, 132). I'm not sure what is wrong in the code.
The problem with your code is that in each iteration, a new value is being assigned to the entire RP_8Recruise
and RP_8RecrType
columns. The final values of these columns are being decided by the df.NEAR_DIST
value in the last row.
Instead of a for-loop use vectorized numpy.where()
method to fill in values
# a mask that checks if it's near
is_near = df.NEAR_DIST <= 100
# if near, Y, else N
plots_dist_joined["RP_8Recruise"] = np.where(is_near, "Y", "N")
# if near, PD, else NaN
plots_dist_joined["RP_8RecrType"] = np.where(is_near, "PD", np.nan)