Suppose we have the following data:
X Y
6
1
2
2
1 1
8
3
4
1
1 2
I want to convert it to:
X Y Y-1 Y-2 Y-3
6
1
2
2
1 1 2 2 1
8
3
4
1
1 2 1 4 3
That is: for rows with X=1
- take 3 previous Y
values and append them to this row.
I "brute-forced" it with a loop:
namevector <- c("Y-1", "Y-2", "Y-3")
mydata[ , namevector] <- ""
for(i in 1:nrow(mydata)){
if(mydata$X[i] != ""){mydata[i,3:5] <- mydata$Y[(i-1):(i-3)]}
}
But it was too slow for my dataset of ~300k points - about 10 minutes.
Then I found a post with a similar question, and they proposed which
function, which reduced the time to tolerable 1-2 minutes:
namevector <- c("Y-1", "Y-2", "Y-3")
mydata[ , namevector] <- ""
trials_rows <- which(mydata$X != "")
for (i in trials_rows) {mydata[i,3:5] <- mydata$Y[(i-1):(i-3)]}
But considering that which
takes less than a second - I believe I can somehow combine which
with some kind of transpose function, but I can't get my mind around it.
I have a big data frame (~300k rows), and ~6k rows have this "X" value.
Is there a fast and simple way to do it fast, instead of iterating through the results of which
function?
You can do this with a single assignment using some vectorised trickery:
mydata[trials_rows, namevector] <- mydata$Y[trials_rows - rep(1:3,each=length(trials_rows))]
mydata
# X Y Y-1 Y-2 Y-3
#1 NA 6
#2 NA 1
#3 NA 2
#4 NA 2
#5 1 1 2 2 1
#6 NA 8
#7 NA 3
#8 NA 4
#9 NA 1
#10 1 2 1 4 3
Basically, take each row in trials_rows
, look backwards three rows using a vectorised subtraction, and then overwrite the combination of trials_rows
in rows and namevector
in columns.
Reproducible example used here:
mydata <- structure(list(X = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, 1L),
Y = c(6L, 1L, 2L, 2L, 1L, 8L, 3L, 4L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-10L))