Search code examples
rtimecrm

Obtaining average inter-purchase time with all dates in one column in R


I am currently developing a timeframe for a churn dataset in R. In order to do so I need the average inter-purchase time of each customer.

The example data below shows two customers (customer 1 and 2) buying on 3 and 2 separate occasions respectively. The average time between purchases for customer 1 is 7.5 days ((9+6)/2), and the average time between purchases for customer 2 is simply 5 days, as there are only two observations for this customer. In code it looks like this:

df <- data.frame(cust_id=c(1,2,1,2,1), 
    order_date=as.Date(c("2012-8-14", "2012-7-1", "2012-8-23", "2012-7-6" 
    "2012-8-29")))

and graphically it looks something like this:

    cust_id    order_date
1   1          2012-8-14 
2   2          2012-7-1
3   1          2012-8-23
4   2          2012-7-6
5   1          2012-8-29

Eventually I want it to look like this:

    cust_id    avg_interpurchase_time
1   1          7.5 
2   2          5

Is anyone able to point in me the right direction?

Thanks!

P.S. I have looked at the following post: Calculating Inter-purchase Time in R but I believe my question differs from the question posed there. I have all dates in one column and one customer can have up to 80 dates, whereas the guy who posted that has only two dates for each customer spread out over 2 columns.


Solution

  • In base R, you could use aggregate together with a custom function:

    aggregate(order_date ~ cust_id, data=df, FUN=function(x) mean(diff(x)))
      cust_id order_date
    1       1       7.5 
    2       2       5.0 
    

    Here, we take the difference by order date and then calculate the mean. Note that this requires that the data are sorted by date. You could make sure this is true by including order in the call to the data.frame, as in data=df[order(df$order_date),] for example.

    data
    Includes a couple of typo corrections from OP.

    df <- 
    structure(list(cust_id = c(1, 2, 1, 2, 1), order_date = structure(c(15566, 
    15522, 15575, 15527, 15581), class = "Date")), .Names = c("cust_id", 
    "order_date"), row.names = c(NA, -5L), class = "data.frame")