Search code examples
rhistogramprobability-densitydensity-plot

How to calculate the density in results of function hist() in R


I understand how to get the density values from this data, for example the density 0.69 is obtained from counts/bin width = 3448:0.5*10000 = 0.6896, right?

set.seed(1234)
h <- hist(rbinom(10000, 10, 0.1), freq=FALSE)

str(h)
#List of 6
# $ breaks  : num [1:11] 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
# $ counts  : int [1:10] 3448 3930 0 1910 0 588 0 112 0 12
# $ density : num [1:10] 0.69 0.786 0 0.382 0 ...
# $ mids    : num [1:10] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
# $ xname   : chr "rbinom(10000, 10, 0.1)"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"

However, using the built-in data in R called airquality$Temp, I got

Temperature <- airquality$Temp
h = hist(Temperature)
str(h)
List of 6
$ breaks  : int [1:10] 55 60 65 70 75 80 85 90 95 100
$ counts  : int [1:9] 8 10 15 19 33 34 20 12 2
$ density : num [1:9] 0.0105 0.0131 0.0196 0.0248 0.0431 ...
$ mids    : num [1:9] 57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.5
$ xname   : chr "Temperature"
$ equidist: logi TRUE
- attr(*, "class")= chr "histogram"

and by doing the same way as before, for example, counts/class width = 8:5 = 1.6 instead of 0.0105. My question is how to calculate the density value (0.0105 0.0131 0.0196 0.0248 0.0431 ...) in this histogram?


Solution

  • You need to divide the counts by the total number of observations and the binwidth:

    h$counts / nrow(airquality) / 5
    #> [1] 0.010457516 0.013071895 0.019607843 0.024836601 0.043137255 0.044444444
    #> [7] 0.026143791 0.015686275 0.002614379
    

    We can see this matches density:

    h$density
    #> [1] 0.010457516 0.013071895 0.019607843 0.024836601 0.043137255 0.044444444
    #> [7] 0.026143791 0.015686275 0.002614379
    

    The calculation is the same for your initial example:

    3448 / 10000 / 0.5
    #> [1] 0.6896