I'm looking to get the difference in counts by version by app_name. My dataset looks like this: app_name, version_id, count, [difference]
Here is the dataset
data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1,
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L,
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id",
"count"), class = "data.frame", row.names = c(NA, -9L))
Given this data.frame, how can I get the lagged difference in count by both app_name & version_id? the initial (first) version diff for each app would be zero, since there would be no difference.
Here is an example of what the final results would look like with that final 'diff' column
structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1,
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L,
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25,
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff"
), class = "data.frame", row.names = c(NA, -9L))
Try using dplyr
and lag
:
library(dplyr)
data %>% group_by(app_name) %>%
mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]),
diffcount = count - dplyr::lag(count, default = count[1]))
Source: local data frame [9 x 5]
Groups: app_name [3]
app_name version_id count diffvers diffcount
(fctr) (dbl) (int) (dbl) (int)
1 a 1.0 600 0.0 0
2 a 1.1 620 0.1 20
3 a 2.3 620 1.2 0
4 b 2.0 200 0.0 0
5 b 3.1 200 1.1 0
6 b 3.3 250 0.2 50
7 b 4.0 250 0.7 0
8 c 1.1 15 0.0 0
9 c 2.4 36 1.3 21