Search code examples
rdataframedifflag

Difference between rows in R on dataframe grouped by column


I'm looking to get the difference in counts by version by app_name. My dataset looks like this: app_name, version_id, count, [difference]

Here is the dataset

    data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id", 
"count"), class = "data.frame", row.names = c(NA, -9L))

Given this data.frame, how can I get the lagged difference in count by both app_name & version_id? the initial (first) version diff for each app would be zero, since there would be no difference.

Here is an example of what the final results would look like with that final 'diff' column

structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25, 
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff"
), class = "data.frame", row.names = c(NA, -9L))

Solution

  • Try using dplyr and lag:

    library(dplyr)
    data %>% group_by(app_name) %>%
             mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]),
                    diffcount = count - dplyr::lag(count, default = count[1]))
    
    Source: local data frame [9 x 5]
    Groups: app_name [3]
    
      app_name version_id count diffvers diffcount
        (fctr)      (dbl) (int)    (dbl)     (int)
    1        a        1.0   600      0.0         0
    2        a        1.1   620      0.1        20
    3        a        2.3   620      1.2         0
    4        b        2.0   200      0.0         0
    5        b        3.1   200      1.1         0
    6        b        3.3   250      0.2        50
    7        b        4.0   250      0.7         0
    8        c        1.1    15      0.0         0
    9        c        2.4    36      1.3        21