I'm using ggplot
/ easyGgplot2
to create density plots of two groups. I would like have a metric or indication of how much intersection there is between the two curves. I might even use any other solution without the curves, as long as it allows me to have a measure of which groups are more distinct (of several different groups of data).
Is there any easy way to do this in R?
For example using this sample, which generates this plot
How can I estimate the percentage of area that is common to both?
ggplot2.density(data=weight, xName='weight', groupName='sex',
legendPosition="top",
alpha=0.5, fillGroupDensity=TRUE )
I like the previous answer, but this may be a bit more intuitive, also I made sure to use a common bandwidth:
library ( "caTools" )
# Extract common bandwidth
Bw <- ( density ( iris$Petal.Width ))$bw
# Get iris data
Sample <- with ( iris, split ( Petal.Width, Species ))[ 2:3 ]
# Estimate kernel densities using common bandwidth
Densities <- lapply ( Sample, density,
bw = bw,
n = 512,
from = -1,
to = 3 )
# Plot
plot( Densities [[ 1 ]], xlim = c ( -1, 3 ),
col = "steelblue",
main = "" )
lines ( Densities [[ 2 ]], col = "orange" )
# Overlap
X <- Densities [[ 1 ]]$x
Y1 <- Densities [[ 1 ]]$y
Y2 <- Densities [[ 2 ]]$y
Overlap <- pmin ( Y1, Y2 )
polygon ( c ( X, X [ 1 ]), c ( Overlap, Overlap [ 1 ]),
lwd = 2, col = "hotpink", border = "n", density = 20)
# Integrate
Total <- trapz ( X, Y1 ) + trapz ( X, Y2 )
(Surface <- trapz ( X, Overlap ) / Total)
SText <- paste ( sprintf ( "%.3f", 100*Surface ), "%" )
text ( X [ which.max ( Overlap )], 1.2 * max ( Overlap ), SText )