Search code examples
rdplyrmissing-datafillcoalesce

Why does dplyr's coalesce(.) and fill(.) not work and still leave missing values?


I have a simple test dataset that has many repeating rows for participants. I want one row per participant that doesn't have NAs, unless the participant has NAs for the entire column. I tried grouping by participant name and then using coalesce(.) and fill(.), but it still leaves missing values. Here's my test dataset:

library(dplyr)
library(tibble)

test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
                       var1 = c(rep(c(NA), 10), 2, 3),
                       var2 = c(rep(c(NA), 9), 2, 4, 6),
                       var3 = c(10, 15, 7, rep(c(NA), 9)),
                       outcome = c(3, 9, 23, rep(c(NA), 9)),
                       tenure = rep(c(10, 15, 20), 4))

And here's what I get when I use coalesce(.) or fill(., direction = "downup"), which both produce the same result.

library(dplyr)
library(tibble)

test_dataset_coalesced <- test_dataset %>% 
  group_by(name) %>%
  coalesce(.) %>%
  slice_head(n=1) %>%
  ungroup()

test_dataset_filled <- test_dataset %>% 
  group_by(name) %>%
  fill(., .direction="downup") %>%
  slice_head(n=1) %>%
  ungroup()

And here's what I want--note, there is one NA because that participant only has NA for that column:

library(tibble)


correct <- tibble(name = c("Justin", "Corey", "Sibley"),
                  var1 = c(NA, 2, 3),
                  var2 = c(2, 4, 6),
                  var3 = c(10, 15, 7),
                  outcome = c(3, 9, 23),
                  tenure = c(10, 15, 20))

Solution

  • You can group_by the name column, then fill the NA (you need to fill every column using everything()) with the non-NA values within the group, then only keep the distinct rows.

    library(tidyverse)
    
    test_dataset %>% 
      group_by(name) %>% 
      fill(everything(), .direction = "downup") %>% 
      distinct()
    
    # A tibble: 3 × 6
    # Groups:   name [3]
      name    var1  var2  var3 outcome tenure
      <chr>  <dbl> <dbl> <dbl>   <dbl>  <dbl>
    1 Justin    NA     2    10       3     10
    2 Corey      2     4    15       9     15
    3 Sibley     3     6     7      23     20