Search code examples
rdistributionbayesianempirical-distribution

MASS packages' "fitdistr": Error when dealing with manipulated random data


Background:

Below I have generated some random beta data using R and manipulate the shape of the data a bit to arrive at what I call "Final" in my code. And I histogram "Final" in my code.

Question:

I'm wondering why when trying to fit a "beta" distribution to "Final" data using MASS packages' "fitdistr" function, I get the following error (Any suggestion how to avoid this error)?

Error in stats::optim(x = c(0.461379379270288, 0.0694261016478062, 0.76934266883081, : initial value in 'vmmin' is not finite

Here is my R code:

 require(MASS)

## Generate some data and manipulate it
set.seed(47)

Initial = rbeta(1e5, 2, 3)
d <- density(Initial)

b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
b.5 <- b.5 / (max(b.5) / max(d$y))    # Scale down to max of original density

 b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
 b.6 <- b.6 / (max(b.6) / max(d$y))

 # Collect maximum densities at each x to use as sample probability weights
 p <- pmax(d$y, b.5, b.6)


Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA

hist(Final, freq = F, ylim = c(0, 2))               ## HERE IS A HISTOGRAM

 m <- MASS::fitdistr(Final, "beta",          ## RUN THIS TO SEE HOW THE ERROR COMES UP
                start = list(shape1 = 1, shape2 = 1))

Solution

  • Here is the code.

    It is the same with your code, I just removed the negative beta values.

    library(MASS)
    
    set.seed(47)
    
    Initial = rbeta(1e5, 2, 3)
    d <- density(Initial)
    
    b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
    
    
    b.5 <- b.5 / (max(b.5) / max(d$y))    # Scale down to max of original 
    density
    
    b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
    b.6 <- b.6 / (max(b.6) / max(d$y))
    
    # Collect maximum densities at each x to use as sample probability weights
    p <- pmax(d$y, b.5, b.6)
    
    
    Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA
    
    hist(Final, freq = F, ylim = c(0, 2))               ## HERE IS A HISTOGRAM
    

    This is the original histogram

    # replace negative beta values with smallest value > 0
    Final[Final<= 0] <- min(Final[Final>0])
    
    hist(Final, freq = F, ylim = c(0, 2))
    

    The histogram after removing negative values

    m <- MASS::fitdistr(x = Final, densfun = "beta",          
                    start = list(shape1 = 1, shape2 = 1))
    

    Here are the shape parameters:

    > m
         shape1       shape2  
      1.99240852   2.90219720 
     (0.02649853) (0.04010168)
    

    Take note that it gives some warnings.