Search code examples
rdplyrmissing-data

How to calculate percent change in R when there are some years of data missing?


I'm calculating the percent change of enrollment from academic year to academic year, but there are some academic years missing data, so I don't want it to calculate the change in those instances and keep it as blank instead of calculating a two year difference. I have multiple years, schools, and groups I am doing this by. Example data frame below and the code I am using currently. So I am missing 2016-17 in this example and don't want to calculate it for 17-18 then.

 School      Academic Year  Group   Enrollment pct_change  
 1 School 1  2018-19        Overall 450        ANSWER 
 2 School 1  2017-18        Overall 630        NA
 3 School 1  2015-16        Overall 635        ANSWER
 4 School 1  2014-15        Overall 750        ANSWER
 5 School 1  2013-14        Overall 704        ANSWER 

data <- data %>%
  group_by(School, Group) %>% 
  mutate(pct_change = (((Enrollment-lead(Enrollment, order_by = `Academic Year`))/Enrollment)) * 100) %>%
  ungroup()

Solution

  • An option may be to expand the data for complete year

    library(dplyr)
    library(tidyr)
    data %>%
        separate(`Academic Year`, into = c("Year", "Day"), 
    remove = FALSE, convert = TRUE) %>%
        group_by(School, Group) %>%
        complete(Year = full_seq(Year, period = 1)) %>%
         mutate(pct_change = (((Enrollment-lead(Enrollment, 
        order_by = Year))/Enrollment)) * 100) %>%
      ungroup()
        filter(complete.cases(Enrollment)) %>%
        select(-Year, -Day)