Search code examples
rdplyrgroup

How to select the close value in vector in R?


I would like to find the group in vector according the close value, like this:

    x <- c(1.001, 1.002, 1.003, 2.0)

and then calculate the mean of group [c(1.001, 1.002, 1.003)] which diff value < 0.1

    result <- c(1.002,1.002,1.002,2)

Thanks! hees


Solution

  • Here is a solution that also handles unordered vectors:

    x <- c(1.001, 1.002, 2.0, 1.003, 2.01, 3)
    o <- order(x)
    y <- x[o]
    g <- cumsum(c(0, diff(y)) >= 0.1)
    res <- tapply(y, g, mean)[as.character(g)]
    res[o]
    #    0     0     1     0     1     2 
    #1.002 1.002 2.005 1.002 2.005 3.000 
    

    Edit:

    for example, i use the test data [x <- c(1.1,1.2,1.3,1.4,1.5, 1.6)], i want to get the closed value within 0.3, your code will give one group, all value is 1.35. actually, i want to have two group , one is 1.1,1.2,1.3,1.4, another is 1.5, 1.6. and then calculate the mean value for each group.

    Here is a quick solution using a for loop. It assumes that x is sorted. If it is too slow, it would be trivial to implement with Rcpp.

    x <- c(1.1,1.2,1.3,1.4,1.5, 1.6)
    
    d <- 0.3
    
    s <- x[1]
    
    g <- numeric(length(x))
    g[1] <- 1
    
    for (i in seq_along(x[-1]) + 1) {
      g[i] <- if (x[i] - s <= d) {
        g[i - 1] 
      } else {
        s <- x[i]
        g[i - 1] + 1
      }
    }
    
    tapply(x, g, mean)[as.character(g)]
    #   1    1    1    1    2    2 
    #1.25 1.25 1.25 1.25 1.55 1.55