Search code examples
rfilteringgroupingextrapolation

Identify groups with all missing values for a variable and extrapolate within groups in R


I want to identify the groups that only have missing values for a given variable. My data looks like this:

id <- seq(1:4)
year <- c(2005, 2006, 2007, 2008)
y <- c(6, 11, NA, NA, NA, NA, NA, NA, 9, NA, NA, 7, 8, 7, NA, 12)
test <- data.frame(id, year, y)
arrange(test, desc(id))
print(test)

I tried to group the data by id and created allNA variable that should separate groups that only have missing values (I need to preserve those rows untouched in the final result as well). Then I extrapolated within the groups that have some non-missing values:

test %>% 
  group_by(id) %>% 
  mutate(allNA = !all(is.na(y))) %>%
  filter(allNA) %>%
  select(-allNA) %>%
  mutate(y2=na.spline(y))

I keep getting inconsistent results for my extrapolation. Sometimes after creating allNA variable, I can see some FALSE values and it leaves the year 2007's values as missing or drop them, but sometimes when I execute the code it is all TRUEs and it extrapolate everything. I wonder if the allNA variable is doing the job it is supposed to be doing. Because of that I decided to first identify the groups _ here year 2007 _ that just have NAs but I don't know how to do that and solve the extrapolation problem.


Solution

  • After creating the column allNA, use if/else to apply the na.spline

    library(dplyr)
    library(zoo)
    test %>% 
        group_by(id) %>%
        mutate(allNA = all(is.na(y)),
                y2 = if(!allNA[1]) na.spline(y) else y)