Search code examples
rplyrsubsetinventory

Conditional subset based on summing adjacent columns in R (plyr package?)


I'm looking for a more efficient way to create subsets in R. Using a data set where rows = products and columns = time, I would like to find those rows (products) where an item started selling in Week 1, then make that a subset. Then do the same for Week 2, etc.

set.seed(4); d <- data.frame(
 product = seq(1:10),
 week1= sample(0:1,10,replace=TRUE), 
 week2= sample(0:3,10,replace=TRUE),
 week3=sample(0:5,10,replace=TRUE), 
 week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)

The full data frame is d. So I need to know 2 things to find all the subsets: 1) that all previous weeks sales were ==0 , and then 2) that this week's sales were not zero.

No subsets should overlap since they group products by when they first entered the market.

I found a poor man's way to do this, but I know there has to be a better way!

INEFFICIENT WAY:

subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]

SLIGHTLY MORE EFFICIENT, BUT STILL POOR

subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]

It feels like I should be able to do something like this but it doesn't work:

subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]

I don't think ddply or apply will work here, but maybe I'm wrong? The result I need are subsets of d, all the columns, like this:

subset3=

product week1 week2 week3 week4 speed quality
   2     0     0     5     1   124      42
   3     0     0     3     5   155      45

Solution

  • You could use something like:

    d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )
    

    This will identify the first non-zero selling week for each product. You could then use this column to split your dataset up like so:

    result <- split(d,d$weekstart)
    

    You can then access each subset like:

    result[[1]]
    

    Changing the 1 in the above code to the starting week you want to access will be analogous to having subset1 subset2 etc.