Search code examples
rparametersstatisticsdistributionestimation

Estimating parameter of a distribution for some generate random number


I just new in R for solving my statistical problem. Currently I'm working to estimate the parameters of a distribution using 200 random numbers (RN) that I generate using R. I generate 200 RN in 100 times. So it means there will be 100 kinds of 200 RN and I will estimate this 100 kinds of RN. It also means that there will be 100 kinds of estimation results. So here is the code I use to generate the RN:

#Generate random numbers U~(0, 1)
rep <-100 #total replication
unif <-matrix(0, 200, rep)
for (k in 1: rep)
{
  unif[,k] <- runif(200, min = 0, max = 1)
}

# Based on the 100 kinds of generated random numbers that follow U ~ (0.1), I will generate again 100 kinds of random numbers which follow the estimated distribution:
# Define parameters
a <- 49.05 #1st parameter
b <- 3.148 #2nd parameter
c <- 0.145 #3rd parameter
d <- 0.00007181 #4th parameter

X <-matrix(0, 200, rep)
for (k in 1: rep)
{
  X[,k] <- a*(log(1-((log(1-((unif[,k])^(1/c))))/(a*d))))^(1/b)
}

# Sorting the generated RN from the smallest to the largest
X_sort <-matrix(0, 200, rep)
for (k in 1: rep)
{
  X_sort[,k] <- sort(X[,k])
}

Up here I've managed to generate 100 kinds of RN that will be estimated. However, the problem I face now is how to estimate this 100 kinds of RN. I can only estimate one. Here is the code I use for estimation the parameter with maxLik package and the estimation method is BHHH:

xi = X_sort[,1]
log_likelihood<-function(theta,xi){
  p1 <- theta[1] #1st parameter
  p2 <- theta[2] #2nd parameter
  p3 <- theta[3] #3rd parameter
  p4 <- theta[4] #4th parameter
  logL=log((p4*p2*p3*((xi/p1)^(p2-1))*(exp(((xi/p1)^(p2))+(p4*p1*(1-(exp((xi/p1)^(p2)))))))*((1-(exp((p4*p1*(1-(exp((xi/p1)^(p2))))))))^(p3-1))))
  return(logL)
}
library(maxLik);
# Initial parameters
a <- 49.05 #1st parameter
b <- 3.148 #2nd parameter
c <- 0.145 #3rd parameter
d <- 0.00007181 #4th parameter
m <- maxLik(log_likelihood, start=c(a,b,c,d), xi = xi, method="bhhh");
summary(m)

Here is the result:

--------------------------------------------
Maximum Likelihood estimation
BHHH maximisation, 5 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -874.0024 
4  free parameters
Estimates:
      Estimate Std. error t value  Pr(> t)    
[1,] 4.790e+01  1.846e+00  25.953  < 2e-16 ***
[2,] 3.015e+00  1.252e-01  24.091  < 2e-16 ***
[3,] 1.717e-01  2.964e-02   5.793 6.91e-09 ***
[4,] 7.751e-05  6.909e-05   1.122    0.262    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------

To estimate the other 99 RN, I have to change manually xi = X_sort[,k] for k=1,2,...,100 , so for the second RN, it should turn into X_sort[,2], and so on until the hundredth RN. I think this is not efficient because it takes a long time to replace them one by one. So is there a way to modify this code so that it did not take long for estimating the other RN?


Solution

  • Firstly, I'd suggest you to rewrite your code in more compact way.

    1. Generating random numbers. There is no need to generate 100 vectors each of length 200 while we can generate a vector of length 100*200 and then write this vector column-wise into matrix. This can be done in the following way:

    rep <-100
    n <-  200
    unif <- matrix(runif(rep*n, min = 0, max = 1), n, rep)
    

    2. Calculating function of matrix. In R it is possible to apply vector functions to matrices. So in your case it will be:

    X <- a*(log(1-((log(1-((unif)^(1/c))))/(a*d))))^(1/b)
    

    3. Column-wise matrix sorting We can easily sort each column of the matrix using apply function. Parameter 2 means that we do it column-wise (1 stands for rows).

    X_sort <- apply(X, 2, sort)
    

    4. Performing estimations. Again, we can use apply here.

    estimations <- apply(X_sort, 2, function(x) maxLik(log_likelihood, start=c(a,b,c,d),
                                              xi = x, method="bhhh"))
    

    Then to print all the summaries you can do the following:

    lapply(estimations, summary)