Search code examples
rdplyr

Conditionally adding a row within a pipe


I have a tibble containing events and their probabilities. The number of outcomes can be two (Yes/No) or greater (A/B/C/...). In the latter case, I have a exhaustive list of events, so I want to do nothing:

ok <- tibble(
    event = c("A", "B", "C"), 
    prob = c(0.1, 0.5, 0.4)
)

If there are only two events, I only have one row, for the probability of the event occurring:

not_ok <- tibble(
    event = "Yes", 
    prob = 0.4
)

These tibbles are modified in a pipe chain. At a precise point, I want to add a row "No" to tibbles of the second type.

Currently, I'm interrupting the pipe to do:

if (nrow(not_ok) == 1) {
    not_ok %<>%
        add_row(event = "No", prob = 1-not_ok$prob)
}

And then I resume the pipe. However doing this is slow, ugly, and requires more assignments.

Is it possible to include this conditional statement inside the pipe chain, without separately creating an if statement? The code in the end should look like:

data %>% 
    do something %>% 
    add row with "No" if necessary %>% 
    do something else %>% 
    plot

If possible, I would like to avoid using global assignments or functions.


For clarity: the data comes from a server request, so I do not know whether a specific requests pulls a tibble of the type ok or not_ok. I need a single operation that works in both cases, by doing nothing in the first case, and adding a row in the second. For example, what I'm doing currently works because I use an if function to only edit not_ok if that's the result of the request. For example:

fread("...") %>% 
    mutate(...) %>% 
    add row if nrow == one

Solution

  • Here's a solution using complete:

    not_ok %>% 
      complete(
        event = c("Yes", "No"),
        fill = list(prob = (1 - sum(not_ok$prob)))
      )
    # A tibble: 2 × 2
      event  prob
      <chr> <dbl>
    1 No      0.6
    2 Yes     0.4
    

    whereas ok is left unchanged:

    ok %>% 
      complete(
        event = c("A", "B", "C"),
        fill = list(prob = (1 - sum(not_ok$prob)))
      )
    # A tibble: 3 × 2
      event  prob
      <chr> <dbl>
    1 A       0.1
    2 B       0.5
    3 C       0.4
    

    If your tibble is grouped, or you need more sophisitcated imputation, a minor variation on this should suffice.

    Edit

    In response to OP's comment, here's how to encapsulate the functionality in a pipe-friendly function.

    make_complete <- function(df, event_list) {
      df %>% 
      complete(
        event = event_list,
        fill = list(prob = (1 - sum(df$prob)))
      )
    }
    

    After which

    not_ok %>% make_complete(c("Yes", "No"))
    

    and

    ok %>% make_complete(c("A", "B", "C"))
    

    Both produce the expected output above. The function can clearly be embedded in a longer pipe.

    With regard to your comment "I would like to avoid using ... functions". I don't see how you can make this generic without using a function...