Search code examples
rcut

R cut function: how to cut data which can include the right lowest and highest boundaries


I am a beginner in R. I used cut function in R to bin my data. My data starts from 0 but after cutting the lower boundary has a negative result and I have no idea why this happened.

My code is:

cancer_rtcl$cancer_rate_cut=cut(cancer_rtcl$rate,6)

The statistical summary of my data is:

 > summary(cancer_rtcl$rate)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0    13.3    16.5    16.4    18.8    63.5 

> dput(cancer_rtcl$rate)
c(63.5, 41.5, 36, 33.9, 29.7, 27.2, 27.2, 26, 25.9, 25.9, 25.3, 
25.1, 24.6, 24.3, 23.6, 23.3, 22.8, 22.7, 22.5, 22.4, 22.3, 22.3, 
21.9, 21.9, 21.7, 21.6, 21.5, 21.4, 21.3, 21.2, 21.2, 20.9, 20.8, 
20.7, 20.5, 20.5, 20.3, 20.2, 20, 19.7, 19.7, 19.6, 19.6, 19.5, 
19.4, 19.1, 19, 19, 19, 18.9, 18.9, 18.8, 18.8, 18.8, 18.8, 18.8, 
18.7, 18.5, 18.5, 18.5, 18.4, 18.3, 18.3, 18.2, 18.2, 18.2, 18.1, 
18.1, 18, 17.9, 17.9, 17.9, 17.8, 17.8, 17.8, 17.7, 17.7, 17.6, 
17.6, 17.6, 17.5, 17.4, 17.4, 17.3, 17.3, 17.3, 17.3, 17.3, 17.2, 
17.2, 17.1, 17.1, 17.1, 17, 17, 16.9, 16.9, 16.9, 16.8, 16.8, 
16.7, 16.6, 16.6, 16.6, 16.5, 16.5, 16.5, 16.5, 16.5, 16.4, 16.4, 
16.4, 16.4, 16.2, 16.1, 16, 16, 16, 16, 15.9, 15.9, 15.8, 15.8, 
15.7, 15.7, 15.7, 15.7, 15.6, 15.6, 15.6, 15.6, 15.6, 15.5, 15.4, 
15.4, 15.4, 15.3, 15.3, 15.3, 15.3, 15.2, 15.1, 15.1, 15, 15, 
14.8, 14.6, 14.6, 14.4, 14.2, 14.2, 14.1, 14.1, 14.1, 14.1, 14, 
13.9, 13.8, 13.7, 13.6, 13.6, 13.6, 13.3, 13.2, 13.2, 13.1, 13.1, 
13, 12.9, 12.9, 12.7, 12.6, 12.5, 12.4, 12.3, 12.3, 12.2, 12, 
11.9, 11.8, 11.6, 11.6, 11.4, 11.4, 11.3, 11, 10.8, 10.8, 10.7, 
10.6, 10.5, 10.2, 9.9, 9.8, 9.7, 9.7, 9.6, 9.6, 9.5, 9.3, 9.2, 
9.2, 9, 9, 8, 7.9, 7.3, 7.1, 7, 6.9, 6.3, 4.6, 0, 0, 0, 0, 0)

But the cutting result is:

6 Levels: (-0.0635,10.6] (10.6,21.2] (21.2,31.8] (31.8,42.3] ... (52.9,63.6]

As you can see, the lowest boundary is a negative result, which is not ideal because I need to make a map based on the binned data.

I also tried another type of coding:

cancer_rtcl$rate_cut=cut(cancer_rtcl$rate,c(5,10,15,20,25))

But in this way, I lost the data larger than 25.

Can anyone help to figure out how to bin the data and get the exact lowest and highest boundaries? Thanks!


Solution

  • Does this work to capture data larger than 25? cancer_rtcl$rate_cut1=cut(cancer_rtcl$rate,c(5,10,15,20,25,Inf))