Search code examples
rfunctionif-statementtext-mining

how to calculate h-point (in R)


I am trying to write a function to calculate h-point. the function is defined over a rank frequency data frame. consider the following data.frame :

DATA <-data.frame(frequency=c(64,58,54,32,29,29,25,17,17,15,12,12,10), rank=c(seq(1, 13)))

and the formula for h-point is :

if {there is an r = f(r), h-point = r } else { h-point = f(i)j-f(j)i / j-i+f(i)-f(j) } where f(i) and f(j) are corresponding frequencies for ith and jth ranks and i and j are adjacent ranks that i<f(i) and j>f(j).

this is what I`ve done so far:

h_point <- function(data){
  x <- seq(nrow(data))
  f_x <- data[["frequency"]][x]
  h <- which(x == f_x)
  if(length(h)>1) h
  else{
    i <- which(x < f_x)
    j <- which(x > f_x)
    s <- which(outer(i,j,"-") == -1, TRUE)
    i <- i[s[,1]]
    j <- j[s[,2]]
    cat("i: ",i, "j: ", j,"\n")
    f_x[i]*j - f_x[j]*i / (i-j + f_x[i]-f_x[j])
  }
}

in DATA , the h-point is 12—because x = f_x. HOWEVER,

h_point(DATA)
i:   j:   
numeric(0)

what am I doing wrong here?


Solution

  • I had a look at your previous post how to calculate h-point but must say that I don't quite follow your method for calculating the h-point.

    Based on the definition of the h-point I found

    enter image description here

    Reference: https://www.researchgate.net/figure/The-definition-of-the-h-point-cf-Popescu-Altmann-2006-25_fig1_281010850

    I think a simpler approach would be to use approxfun to create a function frequency(rank), and then use uniroot to find the h-point:

    get_h_point <- function(DATA) {
        fn_interp <- approxfun(DATA$rank, DATA$frequency)
        fn_root <- function(x) fn_interp(x) - x
        uniroot(fn_root, range(DATA$rank))$root
    }
    
    get_h_point(DATA)
    #[1] 12