I would like to get the index values of the bins in a histogram generated via hist()
Example and details follow:
testhist <- hist(rnorm(1000, 1000, 100), n = 5000, xlim = c(0,5000), probability = TRUE)
gives testhist$density
, which are my 'y' values. So, in the code I define n = 5000
, that is 5000 bins across x 0:5000
. I would like to get the index value of the histogram bin each 'y' value corresponds to.
i.e:
Bin Index | 'y' value
1 0
1 0.000005
1 0
1 0
1 0.0000001
2 0.00002
3 0
3 0.0002
...5000
Any assistance is appreciated.
EDIT: as commenters pointed out, n=
is an approximation. So, lets do this:
testhist <- hist(rnorm(1000, 1000, 100), breaks = seq(0,5000, by = 5), xlim = c(0,5000), probability = TRUE)
Now, you would have 1000 exact bins. How to get the index of a bin corresponding to a 'y' value. i.e. bin 1, which has range of 0:5, has what y
values in it?
EDIT 2: Each bin would correspond to a density
, the more number of bins, the more representative the data would be. Thanks for steering me into the right direction.
There's a bit of confusion about what hist
does or doesn't do here.
n=
argument to hist
, only breaks=
. I think It gives the same result by chance since pretty()
uses n=
and that function is used to define the bins.breaks=5000
does not guarantee 5000 bins, as @Onyambu notes, due to pretty()
-ification of the break-points. From ?hist
: ...the number is a suggestion only; as the breakpoints will be set to pretty values.testhist$density
gives a density in each bin. You can verify this with:set.seed(1)
x <- rnorm(1000, 1000, 100)
testhist <- hist(x, n=5000, xlim = c(0,5000), probability = TRUE)
length(testhist$mids)
#[1] 6820
length(testhist$density)
#[1] 6820
length(testhist$breaks)
#[1] 6821
6820 midpoints of bins, 6820 corresponding densities, and 6821 breaks since you need n+1
breaks to give n
bins.
The original 1000 data-points are represented in these 6820 bins, with many of the counts and corresponding densities being zero.
sum(testhist$counts)
#[1] 1000
sum(testhist$counts == 0)
#[1] 5954
sum(testhist$density == 0)
#[1] 5954
If you want to know which original value of x
corresponds with which bin, you can do:
cut(x, testhist$breaks, labels=FALSE)