I would like to keep only the complete observations for all the years, how can I proceed?
I have the following example:
structure(list(variable = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5,
5, 5), Year = c(2010, 2011, 2012, 2010, 2012, 2010, 2011, 2012,
2011, 2012, 2010, 2011, 2012)), class = "data.frame", row.names = c(NA,
-13L))
I would like to get:
structure(list(variable = c(1, 1, 1, 3, 3, 3, 5, 5, 5), Year = c(2010,
2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012)), row.names = c(1L,
2L, 3L, 6L, 7L, 8L, 11L, 12L, 13L), class = "data.frame")
The example is simple, but I need to do this for a vast dataset with the idea of building a balanced dashboard. I appreciate any help.
In base R
, we can use subset
with table
yr <- unique(df$Year)
subset(df, variable %in% names(which(table(variable[Year %in% yr]) ==
length(yr))))
Or with dplyr
, grouped by 'variable', filter
those variables having number of distinct 'Year' (n_distinct
) same as those in the entire data
library(dplyr)
df %>%
group_by(variable) %>%
filter(n_distinct(Year) == n_distinct(.$Year)) %>%
ungroup
# A tibble: 9 x 2
variable Year
<dbl> <dbl>
1 1 2010
2 1 2011
3 1 2012
4 3 2010
5 3 2011
6 3 2012
7 5 2010
8 5 2011
9 5 2012