I have prepared an example data.table:
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
> testTable
years policy destination
1: 2014 A Moskaw
2: 2014 A London
3: 2014 A Berlin
4: 2015 A Paris
5: 2015 A Milano
6: 2015 A Berlin
7: 2016 A Paris
8: 2016 A Valencia
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
Here, i only want to keep observations that have the same destination
over all years available in the data. In this example i have chosen policies with 3 years only, but the real data may as well have 2,3 and 4 years of history mixed in a single data.table.
The desired result is:
> testTable
years policy destination
3: 2014 A Berlin
6: 2015 A Berlin
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
Any ides?
I tried using dcast()
, and then i wanted to filter those rows that have the same entries in all columns after policy
, however i realised that dcast()
automatically converts my character variable destination
into numeric and aggregates my data using length:
Aggregate function missing, defaulting to 'length'
NOTE: my data will have couple of hundreds of observations.
We can filter those common destination
which is present in each years
for each policy
.
library(data.table)
testTable[testTable[, destination %in%
Reduce(intersect, split(destination, years)), policy]$V1]
# years policy destination
# 1: 2014 A Berlin
# 2: 2015 A Berlin
# 3: 2016 A Berlin
# 4: 2014 B Paris
# 5: 2014 B London
# 6: 2014 B Berlin
# 7: 2015 B Paris
# 8: 2015 B London
# 9: 2015 B Berlin
#10: 2016 B Paris
#11: 2016 B London
#12: 2016 B Berlin
and in dplyr
:
library(dplyr)
testTable %>%
group_by(policy) %>%
filter(destination %in% Reduce(intersect, split(destination, years)))