I am facing a problem with the large amount of time my code takes to run. I have a data frame of approximately 23000 columns and 600 rows that folllows the following principle:
date <- c(30032015,30042015,31052015,30062015,31072015,31082015,30092015)
AAPL <- c(10,NA,NA,10,NA,NA,20)
MSFT <- c(10,NA,NA,30,NA,NA,25)
sales <- data.frame (date,AAPL,MSFT)
sales$date <- strptime (sales$date, format="%d%m%Y")
And I want the values of april and may to be equal to the values of march and the same relative to july and august relative to june.
What I am doing is this
sales [is.na(sales)] <- 0
for (i in 1:6){
for (j in 2:3){
sales[i,j] <- ifelse(sales[i,j]>0,sales[i,j],ifelse(sales[i-1,j]>0,sales[i-
1,j],ifelse(sales[i-2,j]>0,sales[i-2,j],NA)))
}}
However for a big data frame is taking a lot of hours. Isn't it possible to somehow say that the values in month 4 and 5 are equal to the values in month 3 and so on?
Thank you in advance
You probably want the na.locf()
function from the zoo
package. It carries the last observation forward to replace na values.
require(zoo)
sales[,2:3] <- apply(sales[,2:3],2,na.locf)