I have the following dataframe:
ID estimation1 estimation2
A 0.0234 0.0220
A 0.0234 3
A 0.0234 0.034
B -0.005 -1.89
B -0.005 0.03
B -0.005 -0.0052
C 0.10 -0.00067
C 0.10 -0.98
C 0.10 0.11
df <- structure(list(ID = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), estimation1 = c(0.0234, 0.0234, 0.0234, -0.005, -0.005, -0.005, 0.10, 0.10, 0.10), estimation2 = c(0.022, 3, 0.034, -1.89, 0.03, -0.0052, -0.00067, -0.98, 0.11)), class = "data.frame", row.names = c(NA,
-3L))
I would like to retain only the row in which estimation1 and estimation2 are quite similar, in this case only the first row, with the following output:
ID estimation1 estimation2
A 0.0234 0.0220
B -0.005 -0.0052
C 0.10 0.11
Is there a function in R being able to do something like that? Really thank you!
Update: After clarification:
One general way could be to group and find the lowest difference of the absolute value and filter thereafter:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(diff = abs(estimation2 - estimation1)) %>%
filter(diff == min(diff)) %>%
select(-diff)
ID estimation1 estimation2
<chr> <dbl> <dbl>
1 A 0.0234 0.022
2 B -0.005 -0.0052
3 C 0.1 0.11
First answer:
With base R we could subset by indicating the "similarity" here 0.02
:
df[abs(df$estimation1 - df$estimation2) < 0.02, ]
ID estimation1 estimation2
1 A 0.0234 0.022
or with dplyr
:
library(dplyr)
df %>% filter(abs(estimation1 - estimation2) < 0.02)