Search code examples
rpanel

Build a balanced panel keeping observations that are repeated every year


I would like to keep only the complete observations for all the years, how can I proceed?

I have the following example:

structure(list(variable = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 
5, 5), Year = c(2010, 2011, 2012, 2010, 2012, 2010, 2011, 2012, 
2011, 2012, 2010, 2011, 2012)), class = "data.frame", row.names = c(NA, 
-13L))

I would like to get:

structure(list(variable = c(1, 1, 1, 3, 3, 3, 5, 5, 5), Year = c(2010, 
2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012)), row.names = c(1L, 
2L, 3L, 6L, 7L, 8L, 11L, 12L, 13L), class = "data.frame")

The example is simple, but I need to do this for a vast dataset with the idea of building a balanced dashboard. I appreciate any help.


Solution

  • In base R, we can use subset with table

    yr <- unique(df$Year)
    subset(df, variable %in% names(which(table(variable[Year %in% yr]) == 
           length(yr))))
    

    Or with dplyr, grouped by 'variable', filter those variables having number of distinct 'Year' (n_distinct) same as those in the entire data

    library(dplyr)
    df %>%
        group_by(variable) %>% 
        filter(n_distinct(Year) == n_distinct(.$Year)) %>% 
        ungroup
    # A tibble: 9 x 2
      variable  Year
         <dbl> <dbl>
    1        1  2010
    2        1  2011
    3        1  2012
    4        3  2010
    5        3  2011
    6        3  2012
    7        5  2010
    8        5  2011
    9        5  2012