Search code examples
rfilterdplyrtibble

How to replace an empty output from dplyr::filter()


I have a dataset dt with a column named x which contains numerics and unexpected values. My goal is to retrieve a value from a lookup table based on the value of x without cleaning x (because there are unexpected values), by using the dplyr::filter() function. If the condition statements is not found in the lookup table, the dplyr::filter() return an empty tibble and I want to replace this output by the value of 0.0 as an array.

Here is an example of my code:

dt <- tibble(x = c(0, -1, 0.5))
lookup_table <- tibble(
         lower_bound = c(0, 0.2, 0.5),
         upper_bound = c(0.2, 0.5, 1000000),
         output = c(0.1, 0.2, 0.3)
        )
y <- lookup_table %>% filter(lower_bound <= dt$x, upper_bound > dt$x) %>% select(output) %>% pull() %>% if_else(length() != 0, lookup_table %>% filter(lower_bound <= dt$x, upper_bound > dt$x) %>% select(output) %>% pull(), 0.0)
y
>>> [1]  0.1 0.0 0.3 # Expected output

Thanks you,

John

EDIT: Please notice that dt and the lookup table have not necessarily the same number of rows.


Solution

  • I think I found the solution. We can combine the findInterval and gsub R functions together with dplyr and tidyr packages.

    library(dplyr)
    library(tidyr)
    
    
    dt <- tibble(x = c(0, -1, 0.5))
    lookup_table <- tibble(
             lower_bound = c(0, 0.2, 0.5),
             upper_bound = c(0.2, 0.5, 1000000),
             output = c(0.1, 0.2, 0.3)
            )
    y <- arrange(lookup_table, lower_bound)[as.numeric( dt$x %>% findInterval(lookup_table %>% arrange(lower_bound) %>% select( lower_bound ) %>% pull() ) %>% gsub(pattern=0, replacement=NA)), 'output'] %>% pull() %>% replace_na(0)
    y
    >>> [1]  0.1 0.0 0.3 # Actual output