I am trying to run a knn regression, however, I have a lot of dummy variables and therefore a lot of ties. To solve this problem, I want to add noise to the dummies. So I want to give the rows with 1 on a specific variable a random value between 1 and 0.99. I want to do the same for rows with a zero value, but then give them a random number between 0 and 0.01. Can somebody help me with an efficient way to transform my dummy variables?
You can use an ifelse
statement to transform your dummy vars:
set.seed(4)
df <- data.frame(letter=letters[1:10],dummy=sample(0:1,10,replace = T))
df$newdummy <- ifelse(df$dummy==1,runif(1,0.99,1),runif(1,0,0.01))
Here I add a new column, but you can substitute the existing one by assigning the ifelse
statement to the old dummy variable.
However, I agree with the answer of @SamR, about dummy variables. It is not very clear what you want to do with the dummy variable