Search code examples
rfrequencydensity-plot

R density() function


Sorry, if this question is trivial but I see no solution: I've been using the density() function frequently, always without troubles but now I work with some data set - lets call it tab - with (many) relatively small values and suddenly density(tab) gives something like absolute frequencies - any ideas what I did wrong?

(Note: Also hist(tab, freq = FALSE) gives something weird for tab.)

Remark: summary(tab) gives:

  Min. /   1st Qu.  /   Median   /    Mean  /   3rd Qu.  /     Max. 

-0.0042810  /0.0002679 / 0.0011750 / 0.0071690 / 0.0049510  /0.5839000 

I'd also be very grateful for any general hint, under which circumstances density() gives no relative frequencies as y-values.


Solution

  • While I can't exactly reproduce your example, it looks to me like you have a huge outlier in your dataset. I.e., your 3rd quartile is 0.005, but the maximum value is 0.584. On the real axis, the distance from your 3rd quartile to your minimum value is 0.01. The distance from the 3rd quartile to the maximum value is over 0.583. That's 58 times farther! Per my understanding density tries to pick a bandwidth that works well across all values. In this case, the bandwidth is likely to be very small, given that most values are clustered together close to 0. In that case you might get a very degenerate density plot, with two vertical lines, one on the left, and one on the right. I was able to generate one such plot using:

    plot(density(c(rnorm(100, 0, 0.001), 100)))
    

    All I do is take a sample from a normal distribution, with SD of 0.001, and add an outlier, 100, to this distribution. The density then looks something like this: degenerate density plot[1] The density values sure look like they could be confused for frequencies, but they are not. Of course, if I remove the outlier then the estimated density function gets nicely bell-shaped:

    regular density plot

    So, it seems likely that you need to remove an outlier from your data.