Search code examples
rcdfkolmogorov-smirnov

Error in ks.test comes from the cumulative distribution function is not written true because the discrete distribution has not buit -in function in R


I have this discrete data and i want to do one-sample kolmogorov smirnov test but when i run the following code it give me an error

 d <- c(5, 11, 21, 31, 46, 75, 98, 122, 145, 165, 196, 224, 245, 293,321, 330, 350, 420)

 #log likelihood function for the discrete distribution
 #*********************************************************

loglik <-function(param){   

  q <- param[1]
  if(q<=0){return(NaN)}
  b <- param[2]
  if(b<=0){return(NaN)}
  c <- param[3]
  if(c<=0){return(NaN)}
  sum(log((q^(sqrt(d)*(1+b*c^d)))-(q^(sqrt(d+1)*(1+b*c^(d+1)))))) 
}

# maximum likelihood estimation using maxLik function
#*****************************************************
library(maxLik)
mle <- maxLik(loglik, start=c(q=0.9658,b=0.1237,c=1.0086), control=list(printLevel=2)) 


# the cumulative distribution function of the discrete distribution
#******************************************************************* 

cdf <- function(param){    
 if(q<=0){return(NaN)}
  b <- param[2]
  if(b<=0){return(NaN)}
  c <- param[3]
  if(c<=0){return(NaN)}
   
 cdf= 1-q^(sqrt(d)(1+b*c^d))
}
#one-sample kolmogorov smirnov test
#**********************************

ks<- ks.test(d, cdf, q=coef(mle)[1], b=coef(mle)[2], c= coef(mle)[3] ) 

Error:

Error in y(sort(x), ...) : 
  unused arguments (q = coef(mle)[1], b = coef(mle)[2], c = coef(mle)[3])

when i try to test the cdf function,

cdf(q=2,b=3,c=3)

R gives me the following error

Error in cdf(q = 2, b = 3, c = 3) : 
  unused arguments (q = 2, b = 3, c = 3)

I think the error in ks.test comes from the wrong cumulative distribution function.


Solution

  • The code in the question has several bugs, including passing a wrong number of parameters to cdf.

    d <- c(5, 11, 21, 31, 46, 75, 98, 122, 145, 165, 196, 224, 245, 293,321, 330, 350, 420)
    
    # log likelihood function for the discrete distribution
    
    loglik <-function(param){   
      if(any(param <= 0)){
        NaN
      } else {
        q <- param[1]
        b <- param[2]
        c <- param[3]
        sum(log((q^(sqrt(d)*(1+b*c^d)))-(q^(sqrt(d+1)*(1+b*c^(d+1)))))) 
      }
    }
    
    # maximum likelihood estimation using maxLik function
    library(maxLik)
    
    start_param <- c(q = 0.9658, b = 0.1237, c = 1.0086)
    mle <- maxLik(loglik, start = start_param, control=list(printLevel=2)) 
    
    
    # the cumulative distribution function of the discrete distribution
    
    cdf <- function(d, param){    
      if(any(param <= 0)){
        NaN
      } else {
        q <- param[1]
        b <- param[2]
        c <- param[3]
        1 - q^(sqrt(d)*(1+b*c^d))
      }
    }
    
    #one-sample kolmogorov smirnov test
    ks <- ks.test(d, "cdf", coef(mle)) 
    #
    #   One-sample Kolmogorov-Smirnov test
    #
    #data:  d
    #D = 0.080566, p-value = 0.9991
    #alternative hypothesis: two-sided