Search code examples
rregressioncurve-fittingnlsnon-linear-regression

`nls` fails to estimate parameters of my model


I am trying to estimate the constants for Heaps law. I have the following dataset novels_colection:

  Number of novels DistinctWords WordOccurrences
1                1         13575          117795
2                1         34224          947652
3                1         40353         1146953
4                1         55392         1661664
5                1         60656         1968274

Then I build the next function:

# Function for Heaps law
heaps <- function(K, n, B){
  K*n^B
}
heaps(2,117795,.7) #Just to test it works

So n = Word Occurrences, and K and B are values that should be constants in order to find my prediction of Distinct Words.

I tried this but it gives me an error:

fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B), 
    data = novels_collection[,2:3], 
    start = list(K = .1, B = .1), trace = T)

Error = Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model

Any idea in how could I fix this or a method to fit the function and get the values for K and B?


Solution

  • If you take log transform on both sides of y = K * n ^ B, you get log(y) = log(K) + B * log(n). This is a linear relationship between log(y) and log(n), hence you can fit a linear regression model to find log(K) and B.

    logy <- log(DistinctWords)
    logn <- log(WordOccurrences)
    
    fit <- lm(logy ~ logn)
    
    para <- coef(fit)  ## log(K) and B
    para[1] <- exp(para[1])    ## K and B