I have a question about creating lag variables depending on a time factor.
Basically I am working with a baseball dataset where there are lots of names for each player between 2002-2012. Obviously I only want lag variables for the same person to try and create a career arc to predict the current stat. Like for example I want to use lag 1 Average (2003) , lag 2 Average (2004) to try and predict the current average in 2005. So I tried to write a loop that goes through every row (the data frame is already sorted by name and then year, so the previous year is n-1 row), check if the name is the same, and if so then grab the value from the previous row.
Here is my loop:
i=2 #as 1 errors out with 1-0 row
for(i in 2:6264){
if(TS$name[i]==TS$name[i-1]){
TS$runvalueL1[i]=TS$Run_Value[i-1]
}else{
TS$runvalueL1 <- NA
}
i=i+1
}
Because each row is dependent on the name I cannot use most of the lag functions. If you have a better idea I am all ears!
Sample Data won't help a bunch but here is some:
edit: Sample data wasn't producing useable results so I just attached the first 10 people of my dataset. Thanks!
TS[(6:10),c('name','Season','Run_Value')]
name Season ARuns
321 Abad Andy 2003 -1.05
3158 Abercrombie Reggie 2006 27.42
1312 Abercrombie Reggie 2007 7.65
1069 Abercrombie Reggie 2008 5.34
4614 Abernathy Brent 2002 46.71
707 Abernathy Brent 2003 -2.29
1297 Abernathy Brent 2005 5.59
6024 Abreu Bobby 2002 102.89
6087 Abreu Bobby 2003 113.23
6177 Abreu Bobby 2004 128.60
Thank you!
Smth along these lines should do it:
names = c("Adams","Adams","Adams","Adams","Bobby","Bobby", "Charlie")
years = c(2002,2003,2004,2005,2004,2005,2010)
Run_value = c(10,15,15,20,10,5,5)
library(data.table)
dt = data.table(names, years, Run_value)
dt[, lag1 := c(NA, Run_value), by = names]
# names years Run_value lag1
#1: Adams 2002 10 NA
#2: Adams 2003 15 10
#3: Adams 2004 15 15
#4: Adams 2005 20 15
#5: Bobby 2004 10 NA
#6: Bobby 2005 5 10
#7: Charlie 2010 5 NA