I have a dataframe with "id" of an individual and two traits ("x" e "y") like the following:
id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x = c(10,4,6,8,9,8,7,6,12,14,11,9,8,4,5,10,14,12,15,7,10,14,24,28)
y = c(1.5,1.2,5,2,0.8,4,1,1.1,1.2,1.4,1.3,1.6,0.9,0.8,1,1.1,1.3,1.5,1.2,1.1,1,1.2,1.1,1)
a = data.frame(id,x,y)
I want to have a loop to iterate over each trait and for each individual so that I can create a new dataframe (or new columns of a) in which the individual will have a 1 if it is an outlier and a 0 if it is not. Considering outlier as any point that is deviated ± 3 sd from the mean of the trait.
In this example, an outlier for "x" is 28 and for "y" is 5. The required result then could be something like:
id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x_out = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
y_out = c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
a_out = data.frame(id, x_out, y_out)
Any idea how to do it in a loop? The idea is that if I include new traits or individuals, I don't need to change the loop. Thanks!
No need for loops, you can just test whether the absolute z-score (abs(scale())
) is >= 3
for all columns at once:
a_out <- a
a_out[, -1] <- as.integer(abs(scale(a[, -1])) >= 3)
#> a_out
id x y
1 A1 0 0
2 A2 0 0
3 A3 0 1
4 A4 0 0
5 A5 0 0
6 A6 0 0
7 A7 0 0
8 A8 0 0
9 A9 0 0
10 A10 0 0
11 A11 0 0
12 A12 0 0
13 A13 0 0
14 A14 0 0
15 A15 0 0
16 A16 0 0
17 A17 0 0
18 A18 0 0
19 A19 0 0
20 A20 0 0
21 A21 0 0
22 A22 0 0
23 A23 0 0
24 A24 1 0
Or using dplyr:
library(dplyr)
a_out <- a %>%
mutate(across(!id, \(x) as.integer(abs(scale(x)) >= 3)))
# same output as above