If I have the following dataframe/tibble in R:
tibble(
period =
c(
"2010Q1",
"2010Q2",
"2010Q3",
"2010Q4",
"2010END",
"2011Q1",
"2011Q2",
"2011Q3",
"2011Q4",
"2011END",
"2012Q1",
"2012Q2",
"2012Q3",
"2012Q4",
"20120END",
"2010Q1",
"2010Q2",
"2010Q3",
"2010Q4",
"2010END",
"2011Q1",
"2011Q2",
"2011Q3",
"2011Q4",
"2011END",
"2012Q1",
"2012Q2",
"2012Q3",
"2012Q4",
"20120END"),
website = c(
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook"
),
values = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1)
)
How can i perform the lag calculation for each period value from the year prior e.g I would like to create the calculation for 2011Q1 - 2010Q1 and so on including 2011END - 2010END
So that i get a table which looks like the following:
tibble(
period =
c(
"2010Q1",
"2010Q2",
"2010Q3",
"2010Q4",
"2010END",
"2011Q1",
"2011Q2",
"2011Q3",
"2011Q4",
"2011END",
"2012Q1",
"2012Q2",
"2012Q3",
"2012Q4",
"20120END",
"2010Q1",
"2010Q2",
"2010Q3",
"2010Q4",
"2010END",
"2011Q1",
"2011Q2",
"2011Q3",
"2011Q4",
"2011END",
"2012Q1",
"2012Q2",
"2012Q3",
"2012Q4",
"20120END"),
website = c(
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"google",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook",
"facebook"
),
values = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1),
calculation = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,14,12,10,8,6,4,2,0,-2,-4,-6,-8,-10,-12,-14))
Here it makes sense we cannot compare to the period prior so it is NA - for the year 2011 all values are calculated like this:
Having some issues with using the lag() function when trying to group the data by the period column.
If every year is divided into 5 periods, lag(...,5)
should take the value 5 rows earlier that the one you are calculating.
example %>%
mutate(calculation = values - lag(values,5))
Output:
# A tibble: 30 x 3
period values calculation
<chr> <dbl> <dbl>
1 2010Q1 1 NA
2 2010Q2 2 NA
3 2010Q3 3 NA
4 2010Q4 4 NA
5 2010END 5 NA
6 2011Q1 6 5
7 2011Q2 7 5
8 2011Q3 8 5
9 2011Q4 9 5
10 2011END 10 5
# ... with 20 more rows
EDIT: As @AndrewGB accurately said, group_by(website)
must be added to separate operations for each website. Also, I'm assuming that rows are already arranged by period.
example %>%
group_by(website) %>%
mutate(calculation = values - lag(values,5))