Is there some functionality in R which will pretty-print a numeric vector converted into a factor when some values are beyond breaks
? The desired input and output is
data <- seq(5, 95, 10)
result <- cutSpecial(data, breaks = c(30, 40, 50, 60, 70))
disc <- c("<30", "<30", "<30", "[30, 40)", "[40, 50)", "[50, 60)", "[60, 70)",
+ ">70", ">70", ">70")
cbind(data, disc)
data disc
[1,] "5" "<30"
[2,] "15" "<30"
[3,] "25" "<30"
[4,] "35" "[30, 40)"
[5,] "45" "[40, 50)"
[6,] "55" "[50, 60)"
[7,] "65" "[60, 70)"
[8,] "75" ">70"
[9,] "85" ">70"
[10,] "95" ">70"
The base R cut
function simply turns values outside of the range into unsatisfying NA
. What function in the R ecosystem would cutSpecial
be?
It would be chop()
from my santoku
package:
library(santoku)
data <- seq(5, 95, 10)
chop(data, c(30, 40, 50, 60, 70))
## [1] [5, 30) [5, 30) [5, 30) [30, 40) [40, 50) [50, 60) [60, 70) [70, 95] [70, 95]
## [10] [70, 95]
## Levels: [5, 30) [30, 40) [40, 50) [50, 60) [60, 70) [70, 95]
If you want specific labels you can either pass them in yourself:
chop(data, c(30, 40, 50, 60, 70), c("< 30", "[30-40)", "[40-50)", "[50-60)", "[60-70)", ">= 70"))
Or in the latest version, you can use lbl_dash()
and specify first
and last
:
chop(data, c(30, 40, 50, 60, 70), labels = lbl_dash(first = "< 30", last = ">= 70"))
## [1] < 30 < 30 < 30 30 - 40 40 - 50 50 - 60 60 - 70 >= 70 >= 70 >= 70
## Levels: < 30 30 - 40 40 - 50 50 - 60 60 - 70 >= 70
There's no such argument for the default interval labels, but maybe there should be.