I know this must be pretty basic, but what is the proper, accurate way to plot the PDF of some sample data that you know comes from some pop. distribution, like if you generated it using rnorm()
or rexp()
?
The reason I ask is because I know a lot of people use density()
, and then input that into plot()
, but the density()
function seems too arbitrary to be accurate; for example, it is inaccurate when it approximates negative value for data that came from the exponential distribution, which does not possess neg. values.
So could someone recommend me a more fine-tuned method to accomplish plotting sample PDFs?
The density
function performs kernel density estimation (KDE). To find the best KDE for your dataset, you should tune the bandwidth (parameter bw
). Here's a paper that discusses KDE and bandwidth selection: http://www.stat.washington.edu/courses/stat527/s13/readings/Sheather_StatSci_2004.pdf
Or for a simpler approach, you can try out different bandwidth methods to pass to bw
:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/bandwidth.html
The current default, "nrd0", is there for historical reasons. I find "ucv" and "bcv" have worked better for my datasets.