Search code examples
rdistributiondifferencekernel-density

Calculating and plotting differences in kernel density distributions in r


I'm using R and I want to calculate the differences between two Kernel density distributions at each point on the x axis and plot that difference but am having some trouble. Is there a certain function or way that I can do this? For context, I'm using blood pressure data and I want to calculate the differences at each point in the blood pressures between men and women.

My code for the distributions (not the differences) looks something like this (SBP=systolic blood pressure):

km <- density(data$SBP[data$GENDER==0], bw="nrd0", adjust = 1, kernel = c("gaussian"), window = kernel, n=512, cut=3, give.Rkern = FALSE, na.rm=FALSE)
kf <- density(data$SBP[data$GENDER==1], bw="nrd0", adjust = 1, kernel = c("gaussian"), window = kernel, n=512, cut=3, give.Rkern = FALSE, na.rm=FALSE)

plot(km, xlab="SBP", main="SBP Distribution of Men & Women", col="blue")
lines(kf, col="green")

I am completely new to all this! I'm pretty sure my exact question has also not been asked here but please lead me to any other resources that may help. Thanks.


Solution

  • The density objects have elements x and y elements that store the x-axis and distribution function values repectively. If you use the same from and to arguements for both density() calls, then the x values calculated should be the same.

    Store the x-y values for both densities in a dataframe an d then merge/join them on x, then you can calculate the difference and plot them:

    x <- rnorm(1000,0,1)
    y <- rnorm(1000,1,1)
    fx <- density(x,from = -5,to=5)
    fy <- density(y,from = -5,to=5)
    plot(fx,col='blue',main="SBP Distribution of Men & Women")
    lines(fy, col="green")
    
    dfx <- data.frame(x=fx$x,
                      fx=fx$y)
    
    dfy <- data.frame(x=fy$x,
                      fy=fy$y)
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(ggplot2)
    

    inner_join(dfx,dfy,on='x') %>% 
      mutate(diff=fx-fy) %>% 
      ggplot()+
      geom_line(aes(x=x,y=diff))
    #> Joining, by = "x"
    

    Created on 2020-03-10 by the reprex package (v0.3.0)