I have 100x2 data frame DFN. Running fitdist
on column DFN$Lret gives error message "function mle failed to estimate the parameters, with the error code 100". I figured the reason is the last row contains an NA. Hence I run fitdist
excluding NAs, and now I get error "data must be a numeric vector of length greater than 1". Any thoughts on how to resolve this? Thanks very much.
DFN <- structure(list(LRet = c(0.0011, 0, -0.0026, 0, -0.0015, 0.0038, 3e-04, -0.0021, 4e-04, -0.001, 0, 0.0019, -6e-04, -8e-04, -5e-04, -8e-04, 3e-04, -5e-04, -0.0026, 0.0014, 7e-04, 0, -2e-04, 0.0011, -0.0025, 0.0042, 0.0022, -0.0017, -0.0058, 1e-04, 2e-04, 8e-04, -9e-04, -0.0014, -0.0014, -0.001, -0.0032, -0.0015, 6e-04, -8e-04, 0.001, -0.0014, -0.0017, -8e-04, -0.001, 0.0011, 0.0013, -0.001, 5e-04, 9e-04, -8e-04, -0.0025, 0.0027, 6e-04, 2e-04, -6e-04, 9e-04, -3e-04, -7e-04, 3e-04, 0, 2e-04, -6e-04, 1e-04, -1e-04, -7e-04, -8e-04, 7e-04, -1e-04, -7e-04, 7e-04, 8e-04, -8e-04, 8e-04, 0.0058, -1e-04, -5e-04, 0.0027, -0.0012, 7e-04, 7e-04, 0, 3e-04, -1e-04, 2e-04, -2e-04, -0.0013, -1e-04, 1e-04, -0.0011, 0.0013, 2e-04, -3e-04, -7e-04, 0, 0.0015, 1e-04, 3e-04, -0.0012, NA), LRetPct = c("0.11%", "0.00%", "-0.26%", "0.00%", "-0.15%", "0.38%", "0.03%", "-0.21%", "0.04%", "-0.10%", "0.00%", "0.19%", "-0.06%", "-0.08%", "-0.05%", "-0.08%", "0.03%", "-0.05%", "-0.26%", "0.14%", "0.07%", "0.00%", "-0.02%", "0.11%", "-0.25%", "0.42%", "0.22%", "-0.17%", "-0.58%", "0.01%", "0.02%", "0.08%", "-0.09%", "-0.14%", "-0.14%", "-0.10%", "-0.32%", "-0.15%", "0.06%", "-0.08%", "0.10%", "-0.14%", "-0.17%", "-0.08%", "-0.10%", "0.11%", "0.13%", "-0.10%", "0.05%", "0.09%", "-0.08%", "-0.25%", "0.27%", "0.06%", "0.02%", "-0.06%", "0.09%", "-0.03%", "-0.07%", "0.03%", "0.00%", "0.02%", "-0.06%", "0.01%", "-0.01%", "-0.07%", "-0.08%", "0.07%", "-0.01%", "-0.07%", "0.07%", "0.08%", "-0.08%", "0.08%", "0.58%", "-0.01%", "-0.05%", "0.27%", "-0.12%", "0.07%", "0.07%", "0.00%", "0.03%", "-0.01%", "0.02%", "-0.02%", "-0.13%", "-0.01%", "0.01%", "-0.11%", "0.13%", "0.02%", "-0.03%", "-0.07%", "0.00%", "0.15%", "0.01%", "0.03%", "-0.12%", " NA%")), .Names = c("LRet", "LRetPct"), class = "data.frame", row.names = 901:1000)
library(fitdistrplus)
#Following gives error code 100
f1 <- fitdist(DFN$LRet,"norm")
#Following gives error code 100
f1 <- fitdist(DFN$LRet,"norm", na.rm=T)
#Following gives error data must be a numeric vector of length greater than 1"
f1 <- fitdist(na.exclude(DFN$LRet),"norm")
#Same result using na.omit
Please note if eliminating the last row, containing the NA, then the above code works fine. I would rather not have to eliminate the last row before running fitdist
if can be avoided.
EDIT/UPDATE: eliminating the last row with the NA did solve the issue at first, but I am now failing to reproduce that consistently (i.e. have successfully run the code a few times after eliminating the last row, but not always). I am trying to understand why. I have tried using a 25x2 data frame, a 100x2, and a 300x2, as well as a vector, with similar results. Had thought the size of the data frame or vector may be part of the problem, hence the trials with different sizes.
(Also found the poorly written is.vector
section of the code, but it didn't solve the errors.) The fitdist
function seems to have difficulty with vectors of small variance:
var( na.exclude(DFN$LRet))
[1] 2.220427e-06
You can get around that by multiplying by 10:
> f1 <- fitdist(10*c(na.exclude(DFN$LRet)),"norm")
> f1
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
estimate Std. Error
mean -0.0009090909 0.001490034
sd 0.0148256472 0.001032122
Standard probability theory lets you then correct those estimates: divide by 10 for the mean and by 100 for the variance (or 10 for the sd). The estimates from corrected fitdist
-results are reasonably close to the sample values:
> all.equal( 0.0148256472/10 , sd(na.exclude(DFN$LRet) ) )
[1] "Mean relative difference: 0.005089095"