Search code examples
rfiltergroupingextrapolationlongitudinal

Pick groups that have at least one non-missing value in R


I have a longitudinal data that I want to extrapolate one of its variables. I need to group the data by organizations' IDs and keep those groups/organizations that have at least one non-missing value for that variable (budget). It means that the organizations should be extrapolated individually, due to the longitudinal nature of data. I have used lots of different functions and codes but none of them worked. One of the simplest that I feel it should be working is the following:

DataX %>%
group_by(orgcode) %>%
filter(!is.na(budget) >= 1) %>%
mutate(budget=na.spline(budget))

The problem is that the filter only keeps the non-missing rows and does not take the grouping into account, so it cannot execute extrapolation. Do you know what I am doing wrong? Thank you so much!


Solution

  • This would be better if we had a reproducible example, but let's create a toy version of your data:

    DataX <- data.frame(orgcode = rep(LETTERS[1:5], each = 3),
                        budget  = c(NA, 21000, 22000,
                                    30000, NA, 40000,
                                    NA, NA, NA,
                                    12000, 15000, 14000,
                                    NA, NA, NA))
    
    DataX
    #>    orgcode budget
    #> 1        A     NA
    #> 2        A  21000
    #> 3        A  22000
    #> 4        B  30000
    #> 5        B     NA
    #> 6        B  40000
    #> 7        C     NA
    #> 8        C     NA
    #> 9        C     NA
    #> 10       D  12000
    #> 11       D  15000
    #> 12       D  14000
    #> 13       E     NA
    #> 14       E     NA
    #> 15       E     NA
    

    We can see that organizations with the orgcode C and E have all NA values and should be removed. We can do this by using a dummy variable to find out whether each group is all(is.na(budget)) and filter on that:

    library(dplyr)
    
    DataX %>% 
      group_by(orgcode) %>% 
      mutate(allNA = !all(is.na(budget))) %>%
      filter(allNA) %>%
      select(-allNA)
    
    #> # A tibble: 9 x 2
    #> # Groups:   orgcode [3]
    #>   orgcode budget
    #>   <fct>    <dbl>
    #> 1 A           NA
    #> 2 A        21000
    #> 3 A        22000
    #> 4 B        30000
    #> 5 B           NA
    #> 6 B        40000
    #> 7 D        12000
    #> 8 D        15000
    #> 9 D        14000
    

    Created on 2020-07-29 by the reprex package (v0.3.0)