Search code examples
rplotprobability-densitykernel-densitydensity-plot

Density plot produces too steep a curve


I have the following vector:

> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837, 
  -0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059, 
   0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046, 
  -1, -0.34239692625979, -0.378787878787879, -1.66260162601626, 
   0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153, 
   0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338, 
   0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182, 
   0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307, 
  -0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121, 
  -0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716, 
   0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215, 
   0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583, 
  -5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661, 
  -3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155, 
  -2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485, 
   0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266, 
  -0.0368893320039882, -0.00990683783542832, -0.0166666666666667, 
  -0.0857142857142857, 0, 0.144337527757217, 0.221153846153846, 
  -0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667, 
   0, 0.0344827586206897, 0.561461794019934, 0.458333333333333, 
  -1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012, 
  -0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137, 
   0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128, 
   0, 0.0693069306930693, 0.0293463761671854)

I plotted the density of x, in this way:

d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")

obtaining

[1]: https://i.sstatic.net/HHlCU.png

The peak at 0 is very narrow and outside of it the curve is flat. Thus the graph turns out to be unclear. I had thought about using logarithmic scales to make the peak less steep, and improve the readability of the graph, but there are too many zeros.


Solution

  • The curve is so steep because you have some extreme outliers in your data. You can remove them by boxplotting the data and storing the result as an object (assuming that your data is called dt):

    out <- boxplot(dt)  # store the boxplot as an object
    out$out             # inspect the outliers
     [1]    -1.0000000    -1.6626016    -1.2671374     0.9871848     0.9869751 -1030.4887966
     [7]    -1.0415494    -5.5555643    -3.7619351    -4.7500000    -1.1022374    -2.0478261
    [13]    -3.0452128     0.9896907    -1.8000000    -1.2921687    -1.2028986  -481.3959762
    

    You can remove the outliers from dt and plot again using hist(note that freqmust be set to FALSE if you want to add a density line) as well as superimpose a density line (play around with bw to determine the shape of the density curve):

    hist(dt[!dt %in% out$out], freq = FALSE)
    lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))
    

    enter image description here