Search code examples
rcutdplyr

Why do I get a 'breaks not unique' error in my R code?


I'm newer to R, so this may be a silly mistake. I'm trying to use the cut function, but I keep getting the same error. Error is:

Error: Problem with `mutate()` input `Calls_bucket`.
x 'breaks' are not unique
i Input `Calls_bucket` is `cut(...)

Here's my code (I've tried many different variations. Here are two most recent):

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,c(2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour, na.rm=T)),
                         labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20")))

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,breaks=c(2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour, na.rm=T)),labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20")))

I can get it to work if I simply pick the number of breaks, but I want to define them specifically. this code works, for example:

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,12))

thanks in advance. any help would be greatly appreciated.


Solution

  • While defining the breaks, use unique() if you are using max(Calls_per_Hour). This worked for me

    m3 <- m2 %>%
        mutate(Calls_bucket=cut(Calls_per_Hour,unique(c(0,2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour,na.rm=TRUE))),
                                labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20"),include.lowest = T))
    
    • unique() ensures a unique vector of cuts i.e. if max(Calls_per_Hour) is equal to a value from your given vector, the cuts remain unique.
    • Since you are using 0 to start your labels, you should also include 0 in your cuts.
    • Setting include.lowest=TRUE ensures that the lowest value encountered is assigned a label.