I have a longitudinal data that I want to extrapolate one of its variables. I need to group the data by organizations' IDs and keep those groups/organizations that have at least one non-missing value for that variable (budget). It means that the organizations should be extrapolated individually, due to the longitudinal nature of data. I have used lots of different functions and codes but none of them worked. One of the simplest that I feel it should be working is the following:
DataX %>%
group_by(orgcode) %>%
filter(!is.na(budget) >= 1) %>%
mutate(budget=na.spline(budget))
The problem is that the filter only keeps the non-missing rows and does not take the grouping into account, so it cannot execute extrapolation. Do you know what I am doing wrong? Thank you so much!
This would be better if we had a reproducible example, but let's create a toy version of your data:
DataX <- data.frame(orgcode = rep(LETTERS[1:5], each = 3),
budget = c(NA, 21000, 22000,
30000, NA, 40000,
NA, NA, NA,
12000, 15000, 14000,
NA, NA, NA))
DataX
#> orgcode budget
#> 1 A NA
#> 2 A 21000
#> 3 A 22000
#> 4 B 30000
#> 5 B NA
#> 6 B 40000
#> 7 C NA
#> 8 C NA
#> 9 C NA
#> 10 D 12000
#> 11 D 15000
#> 12 D 14000
#> 13 E NA
#> 14 E NA
#> 15 E NA
We can see that organizations with the orgcode
C and E have all NA
values and should be removed. We can do this by using a dummy variable to find out whether each group is all(is.na(budget))
and filter on that:
library(dplyr)
DataX %>%
group_by(orgcode) %>%
mutate(allNA = !all(is.na(budget))) %>%
filter(allNA) %>%
select(-allNA)
#> # A tibble: 9 x 2
#> # Groups: orgcode [3]
#> orgcode budget
#> <fct> <dbl>
#> 1 A NA
#> 2 A 21000
#> 3 A 22000
#> 4 B 30000
#> 5 B NA
#> 6 B 40000
#> 7 D 12000
#> 8 D 15000
#> 9 D 14000
Created on 2020-07-29 by the reprex package (v0.3.0)