Search code examples
rtime-seriestrend

How to determine trend of time-series of values in R


I am looking for help writing a function that can identify a trend ("positive/negative/mixed", see definition below) in a value for a given customer in a dataset.

I have the following transactional data; all customers have between 3-13 transactions each.

customer_ID transaction_num sales
Josh         1              $35
Josh         2              $50
Josh         3              $65
Ray          1              $65
Ray          2              $52
Ray          3              $49
Ray          4              $15
Eric         1              $10 
Eric         2              $13
Eric         3              $9

I would like to write a function in R that populates a new dataframe as follows

Customer_ID     Sales_Slope  
Josh              Positive
Ray               Negative
Eric               Mixed

where:

Josh's slope is positive because all of his transaction sales costs continue to increase with each additional shopping point

Ray's slope is negative because all of his transactions sales costs continue to decrease with each additional shopping point

Eric's slope is mixed because all of his transaction sales costs fluctate... with no clear trend...

I have tried quite extensively to do this myself but am stuck.. here is some pseudo-code I have been able to put together

counter = max(transaction_num)
while counter >= 0 
 if sales at max transaction_num are greater than sales at max transaction_num - 1) 
   then counter = counter - 1 ; else "not positive slope trend"

Solution

  • I think I would start with something like this. data.table is usually pretty efficient with bigger datasets.

    #Make fake data
    require("data.table")
    data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
                       sales=c(35,50,65,65,52,49,15,10,13,9))
    data[,transaction_num:=seq(1,.N),by=c("customer_ID")]
    

    Now for the actual code.

    data <- data.table(data)
    
    #Calculate difference in rolling two time periods
    rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
      sales[transaction_num+1]-sales[transaction_num])),
      by=c("customer_ID")]
    
    #Sum up positive and negative values
    rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
    rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]
    
    #Make Sales Slope variable
    rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
          ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]
    
    #Make final table
    final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
    final.table
    
    #      customer_ID Sales_Slope
    # 1:        Josh    Positive
    # 2:         Ray    Negative
    # 3:        Eric       Mixed
    
    #You can always merge this result back onto your main dataset if you want
    data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)