I'm looking for a more efficient way to create subsets in R. Using a data set where rows = products and columns = time, I would like to find those rows (products) where an item started selling in Week 1, then make that a subset. Then do the same for Week 2, etc.
set.seed(4); d <- data.frame(
product = seq(1:10),
week1= sample(0:1,10,replace=TRUE),
week2= sample(0:3,10,replace=TRUE),
week3=sample(0:5,10,replace=TRUE),
week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)
The full data frame is d. So I need to know 2 things to find all the subsets: 1) that all previous weeks sales were ==0 , and then 2) that this week's sales were not zero.
No subsets should overlap since they group products by when they first entered the market.
I found a poor man's way to do this, but I know there has to be a better way!
INEFFICIENT WAY:
subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]
SLIGHTLY MORE EFFICIENT, BUT STILL POOR
subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]
It feels like I should be able to do something like this but it doesn't work:
subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]
I don't think ddply or apply will work here, but maybe I'm wrong? The result I need are subsets of d, all the columns, like this:
subset3=
product week1 week2 week3 week4 speed quality
2 0 0 5 1 124 42
3 0 0 3 5 155 45
You could use something like:
d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )
This will identify the first non-zero selling week for each product. You could then use this column to split your dataset up like so:
result <- split(d,d$weekstart)
You can then access each subset like:
result[[1]]
Changing the 1
in the above code to the starting week you want to access will be analogous to having subset1
subset2
etc.