Search code examples
rmodeling

Power Law in Excel works better than R?


I am trying to model some data. I am having better luck with Excel than R, but the Excel solution won't scale so I need to figure how to do this in R.

Excel will map a trendline to the data and a Power curve yields a reasonable y = 0.6462x^-0.542.

When I put the same data into R and try to model it with the continuous power-law in the poweRlaw package I get something like y = 0.14901x^-3.03671. The intercept is way too small and the alpha is way too big.

# 14 days of % of users retained
y = c(0.61431   , 0.42585   , 0.35427   , 0.33893   , 0.28853   , 0.26004   , 0.2352    , 0.20087   , 0.17969   , 0.1848    , 0.17311   , 0.17092   , 0.15777   , 0.14901)

y.pl = conpl$new(y)
y.pl_est = estimate_xmin(c_pl)
y.pl_est

# $KS
# 0.1068587
#
# $xmin
# 0.14901
#
# $pars
# 3.03673
#
# $ntail
# 14

Image shows how the models built from 14-days of data with Excel and R conpl() match up to the Actuals out to day 60

Is there a way to use lm or glm to do a power curve that give reasonable intercept and alpha?


Solution

  • I haven't used the poweRlaw package, but R's base nls (non-linear least squares) function gives results similar to what you got with Excel. If there were a difference, after checking my code for errors, my first thought would be "so much the worse for Excel" :).

    # Data
    dat = data.frame(x=1:14,
    y = c(0.61431   , 0.42585   , 0.35427   , 0.33893   , 0.28853   , 0.26004   , 0.2352    , 0.20087   , 0.17969   , 0.1848    , 0.17311   , 0.17092   , 0.15777   , 0.14901))
    
    # Model
    m1 = nls(y ~ a*x^b, list(a=1,b=1), data=dat)
    summary(m1)
    
    Formula: y ~ a * x^b
    
    Parameters:
      Estimate Std. Error t value Pr(>|t|)    
      a  0.62104    0.01307   47.51 4.94e-15 ***
      b -0.51460    0.01525  -33.74 2.92e-13 ***
    
    # Plot nls model
    curve(coef(m1)[1]*x^coef(m1)[2], from=1, to=14)
    
    # Add curve for Excel model in red
    curve(0.6462*x^(-0.542), from=1, to=14, col="red", lty=2, add=TRUE)
    
    # Add data points
    points(dat$x, dat$y)
    

    enter image description here