I have data from two samples and I want to plot a frequency distribution plot in R. I have the reference done in Excel:
I uploaded in R the data (HistSerp). It's 136 obs. of 2 variables
.
summary(HistSerp)
V1 V2
Min. :0.000 Min. :0.0000
1st Qu.:0.000 1st Qu.:0.3752
Median :0.000 Median :1.2845
Mean :0.055 Mean :1.2144
3rd Qu.:0.082 3rd Qu.:1.9952
Max. :1.082 Max. :2.9800
class(HistSerp$V1)
"numeric"
class(HistSerp$V2)
"numeric"
If I HistSerp.m <- melt(HistSerp)
and ggplot(HistSerp.m) + geom_freqpoly(aes(x = value,
y = ..density.., colour = variable))
the plot looks:
I don't know why the y-axis span that values, and I'm not sure if it's only a y-axis labeling problem, the plot itself seems different.
I've also tried geom_density()
, hist(HistSerp$V1, freq=FALSE)
, etc. but I can't get it as I expect, I got the same as before. I guess there's something wrong with my data but I can't figure out what is it. Any help will be appreciated.
Thanks
Ps. should I copy the data (136x2)?
Update: The data. Sorry if there's a better way to copy it...
0.144 2.024
0.082 2.548
0.082 1.943
0.000 2.599
0.000 2.233
0.000 2.342
0.082 1.655
0.082 2.200
0.000 2.261
0.000 2.408
0.000 2.127
0.000 2.053
0.000 1.929
0.000 1.413
0.000 2.400
0.000 2.777
0.000 2.685
0.000 1.436
0.000 1.573
0.000 2.504
0.000 1.533
0.000 1.434
0.000 1.421
0.000 2.534
0.082 1.728
0.000 1.984
0.082 1.287
0.000 2.324
0.164 2.405
0.279 1.989
0.082 2.729
0.144 2.046
0.226 2.496
0.000 2.980
0.000 2.634
0.000 1.792
0.000 1.571
0.000 0.612
0.000 0.884
0.000 0.449
0.000 2.318
0.082 0.449
0.000 0.449
0.000 0.563
0.082 0.919
0.000 0.617
0.082 1.297
0.144 0.719
0.000 1.897
0.000 1.338
0.000 0.337
0.000 1.555
0.000 0.273
0.291 0.656
0.000 0.273
0.082 0.388
0.082 1.911
0.082 0.852
0.000 1.580
0.000 1.450
0.000 1.209
0.000 2.049
0.082 2.694
0.082 1.089
0.246 2.643
0.000 2.393
0.000 1.702
0.000 2.595
0.000 1.432
0.000 2.094
0.000 1.526
0.082 1.775
0.000 0.273
0.000 1.405
0.000 2.014
0.000 0.543
0.000 0.586
0.000 1.224
0.000 0.719
0.164 0.201
0.000 0.388
0.082 0.232
0.000 0.116
0.000 0.116
0.082 1.395
0.000 0.116
0.000 0.232
0.082 0.844
0.000 1.153
0.082 0.000
0.667 0.000
0.000 1.535
0.000 2.687
0.000 0.922
0.226 0.337
0.197 0.999
1.082 1.373
0.082 0.396
0.082 0.116
0.000 1.667
0.000 0.731
0.000 0.544
0.082 2.072
0.000 2.262
0.164 2.111
0.082 1.675
0.000 0.116
0.000 0.232
0.082 0.116
0.000 1.004
0.000 0.116
0.164 0.116
0.082 0.699
0.000 0.000
0.000 0.273
0.082 0.000
0.000 0.388
0.082 0.000
0.000 0.116
0.000 0.273
0.000 0.000
0.000 0.649
0.164 0.000
0.082 0.000
0.082 0.000
0.000 0.000
0.082 0.000
0.144 1.282
0.000 1.772
0.000 0.116
0.082 0.000
0.000 1.416
0.000 0.563
0.082 0.510
0.000 0.316
0.164 1.124
You have a couple of options:
geom_freqpoly(aes(y = ..count.. / sum(..count..)))
which is probably what you want. Then there's:
geom_freqpoly(aes(y = ..ndensity..))
which is the density estimate, but scaled to range from 0 to 1. (i.e. it will always range from 0 to 1). And finally, the associated:
geom_freqpoly(aes(y = ..ncount..))
which is similar, but for the counts. You can read about the options at ?stat_bin
.