I have a dataset dt
with a column named x
which contains numerics and unexpected values. My goal is to retrieve a value from a lookup table based on the value of x
without cleaning x
(because there are unexpected values), by using the dplyr::filter()
function. If the condition statements is not found in the lookup table, the dplyr::filter()
return an empty tibble and I want to replace this output by the value of 0.0
as an array.
Here is an example of my code:
dt <- tibble(x = c(0, -1, 0.5))
lookup_table <- tibble(
lower_bound = c(0, 0.2, 0.5),
upper_bound = c(0.2, 0.5, 1000000),
output = c(0.1, 0.2, 0.3)
)
y <- lookup_table %>% filter(lower_bound <= dt$x, upper_bound > dt$x) %>% select(output) %>% pull() %>% if_else(length() != 0, lookup_table %>% filter(lower_bound <= dt$x, upper_bound > dt$x) %>% select(output) %>% pull(), 0.0)
y
>>> [1] 0.1 0.0 0.3 # Expected output
Thanks you,
John
EDIT: Please notice that dt
and the lookup table have not necessarily the same number of rows.
I think I found the solution. We can combine the findInterval
and gsub
R functions together with dplyr
and tidyr
packages.
library(dplyr)
library(tidyr)
dt <- tibble(x = c(0, -1, 0.5))
lookup_table <- tibble(
lower_bound = c(0, 0.2, 0.5),
upper_bound = c(0.2, 0.5, 1000000),
output = c(0.1, 0.2, 0.3)
)
y <- arrange(lookup_table, lower_bound)[as.numeric( dt$x %>% findInterval(lookup_table %>% arrange(lower_bound) %>% select( lower_bound ) %>% pull() ) %>% gsub(pattern=0, replacement=NA)), 'output'] %>% pull() %>% replace_na(0)
y
>>> [1] 0.1 0.0 0.3 # Actual output