Search code examples
rcutcategorical-data

How to know border values for cut function in R?


How to know what were the cut values (borders) generated by cut function in R if I did not specify any borders and just wrote the number of divisions needed?

complexes_data2$FlatPlanAmount <-  cut(complexes_data2$FlatPlanAmount, 3,labels = FALSE)

What are the border values?


Solution

  • The documentation says, in the first sentence of section Details, the following. My emphasis.

    Details
    When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.

    So, compute the range length with range and diff and divide it by the number of breaks. Add multiples of this value to min of the vector to be broken to get the break points.

    First test data.

    set.seed(2021)
    x <- runif(100, 0, 10)
    y <- cut(x, 3, labels = FALSE)
    

    Now compute the breaks.

    brks <- min(x) + (1:2)*(diff(range(x)) / 3)
    brks
    #[1] 3.428711 6.690577
    
    z <- cut(x, breaks = c(-Inf, brks, Inf), labels = FALSE)
    identical(y, z)
    #[1] TRUE
    

    This is a function doing it for any value of x and breaks.

    where <- function(x, breaks, na.rm = TRUE){
      min(x, na.rm = na.rm) + seq_len(breaks)[-breaks]*(diff(range(x, na.rm = na.rm)) / breaks)  
    }
    
    where(x, 3)
    #[1] 3.428711 6.690577