I wanted to find outliers 3sd away from the mean. I could do that with the following function. I want to add to the function a replace function. I want to replace the outliers with mean+3sd+(participants value-mean)/mean). In this case should I use for loop? An example for loop I am trying to write is given below. how can the function and for loop be merged? or is there any other way to iterate over every row of the data (participants value) when replacing the outliers? In the end, I want to have a new column as a result of the function. if all these can be achieved with dplyr mutate or aother functions I am open to any solution.
findingoutlier<- function (data, cutoff=3, na.rm=TRUE){
sd <- sd(data, na.rm=TRUE)
mean <- mean(data, na.rm=TRUE
outliers <- (data[data < mean - cutoff * sd | data > mean + cutoff * sd])
return (outliers)
}
for (i in data) {
x<- mean+3sd+(i-mean)/mean
replace(data, outliers, x)
}
# example data
bmi <- c(32.8999, 31.7826, 28.5573, 20.6350, 21.6311, NA, 29.6174, 52.7027, 58.5968, 30.1867, 28.7927, 26.4697, 42.0294, 27.1309, 56.3672, 62.6474, 34.1692, 31.5120, 29.8553, 34.4443, 25.4049, 25.7287, 71.3209, 23.5615, 19.9359,21.7438, 51.9286, 22.1875, NA, 24.4389, 28.1571, 23.7093, 47.5551, 27.7767, 30.3237, NA, 20.7838, 34.1878, 25.1559, 25.8645, 24.9673, 27.5374, 28.5467, 25.0402, 22.1056, 28.0026, 26.7901, 21.5110,NA, 50.7599, NA, 32.6979, 26.5295, 25.5246, 23.9657, 20.1323, 28.0452)
eid <- c(1:57)
df <- cbind(eid, bmi)
df
The trick is that you can use index subsets not only as a right hand side value (something to be read from), but also as a left hand side value (something to be written to), as follows:
m <- mean(data, na.rm=TRUE)
s <- sd(data, na.rm=TRUE)
# get the *indices* of the outliers
indices <- (abs(m - data) > 3*s) | is.na(data)
# compute the replacement for *every* value
replacement <- (data + m) / m + 3*s
# replace *only* the outliers
data[indices] <- replacement[indices]