I have a dataframe for which I have date data and cumulative counts.
I am trying to do a reverse of cumsum to get the daily counts but also getting the counts per group.
I am trying to go from dataframe A to dataframe B.
I am using R and tidyr
.
Here is the code :
df <- data.frame(cum_count = c(5, 14, 50, 5, 14, 50),
state = c("Alabama", "Alabama", "Alabama", "NY", "NY", "NY"),
Year = c(2012:2014, 2012:2014))
Dataframe A
cum_count state Year
1 5 Alabama 2012
2 14 Alabama 2013
3 50 Alabama 2014
4 5 NY 2012
5 14 NY 2013
6 50 NY 2014
Dataframe B
cum_count state Year
1 5 Alabama 2012
2 9 Alabama 2013
3 36 Alabama 2014
4 5 NY 2012
5 9 NY 2013
6 36 NY 2014
I have tried using the diff function :
df <- df %>%group_by(state)%>%
mutate(daily_count = diff(cum_count))
But I get
Error: Column
daily_count
must be length 3 (the number of rows) or one, not 2
Let me know what you think.
Thanks!
diff
returns length
one less than the original length and mutate
requires the output column to have the same length
as the original (or length 1 which can be recycled). We can append a value possibly NA
or the first
value of 'cum_count'
library(dplyr)
df %>%
group_by(state)%>%
mutate(daily_count = c(first(cum_count), diff(cum_count)))
# A tibble: 6 x 4
# Groups: state [2]
# cum_count state Year daily_count
# <dbl> <fct> <int> <dbl>
#1 5 Alabama 2012 5
#2 14 Alabama 2013 9
#3 50 Alabama 2014 36
#4 5 NY 2012 5
#5 14 NY 2013 9
#6 50 NY 2014 36
Or for this purpose, use lag
and subtract from the column itself
df %>%
group_by(state)%>%
mutate(daily_count = replace_na(cum_count - lag(cum_count), first(cum_count)))