I have a data.frame
like this:
set.seed(123)
df <- data.frame(group = sample(c("a", "b", "c"), 20, replace = TRUE),
value = round(runif(20)*10))
# output
group value
1 c 10
2 c 7
3 c 7
4 b 5
5 c 6
6 b 3
7 b 1
8 b 10
9 c 9
10 a 7
11 b 8
12 b 0
13 a 5
14 b 8
15 c 2
16 a 3
17 c 2
18 c 1
19 a 4
20 a 4
And I want to sum up value
for all rows with the same group
, but only if these rows are subsequent. The expected output would be:
# output
group values
1 c 24
2 b 5
3 c 6
4 b 14
5 c 9
6 a 7
7 b 8
8 a 5
9 b 8
10 c 2
11 a 3
12 c 3
13 a 8
I considered using lag()
, but then I would need to know how often a group can occur subsequently. How can I solve this issue?
You can group by consecutive values with dplyr::consecutive_id
or data.table::rleid
:
With dplyr:
library(dplyr)
df %>%
mutate(id = consecutive_id(group)) %>%
summarise(value = sum(value), .by = c(id, group)) %>%
select(-id)
In data.table:
library(data.table)
setDT(df)[, .(group = first(group), value = sum(value)), by = rleid(group)][, rleid := NULL][]
A third option to create the grouping is with lag
and cumsum
:
df %>%
mutate(id = cumsum(lag(group, default = "not a letter") != group))