I have the following two tables:
Table_1
ID Interval
1 10
1 11
2 11
and
Table_2
ID Interval Rating
1 10 0.5
1 10 0.3
1 11 0.1
2 11 0.1
2 11 0.2
The output table should look like this:
ID Interval Mean Ratings
1 10 0.4
1 11 0.1
2 11 0.15
My goal is to join both tables based on the two conditions/columns ID and interval. Given that I have several ratings for the same ID and interval, I want to compute the mean of the ratings. Whereas the IDs are unique (~9500), the interval repeats for different IDs (as seen in the table above). My current approach is the join function with 2 by arguments. How can I create a final table in which Table_1 and Table_2 are joined based on the condition ID and interval, and receive the average rating in the result column?
left_join(Table_1, Table_2, by = c("ID" = "ID", "Interval" = "Interval"))
First of all you would need to summarize second table DT2
and then perform a right join with first table DT1
.
library(data.table)
DT1[DT2[, .(Mean_Rating = mean(Rating)), .(ID, Interval)], on = c(ID = "ID", Interval = "Interval")]
which gives
ID Interval Mean_Rating
1: 1 10 0.40
2: 1 11 0.10
3: 2 11 0.15
Sample data:
DT1 <- structure(list(ID = c(1L, 1L, 2L), Interval = c(10L, 11L, 11L
)), .Names = c("ID", "Interval"), class = c("data.table", "data.frame"
), row.names = c(NA, -3L))
DT2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Interval = c(10L,
10L, 11L, 11L, 11L), Rating = c(0.5, 0.3, 0.1, 0.1, 0.2)), .Names = c("ID",
"Interval", "Rating"), class = c("data.table", "data.frame"), row.names = c(NA,
-5L))