Adding two kernel density objects in R?

Suppose we have two objects created using the density() function. Is there a way to add these two objects to get another density (or similar) object?

For example:

A = rnorm(100)
B = rnorm(1000)
dA = density(A)
dB = density(B)
dC = density(c(A, B))

Is there a way to get the dC object from the dA and dB objects? Some king of sum operation?

Solution

A return from density is a list with these parts:

> str(dA)
List of 7
 $ x        : num [1:512] -3.67 -3.66 -3.65 -3.64 -3.63 ...
 $ y        : num [1:512] 0.00209 0.00222 0.00237 0.00252 0.00268 ...
 $ bw       : num 0.536
 $ n        : int 4
 $ call     : language density.default(x = A)
 $ data.name: chr "A"
 $ has.na   : logi FALSE
 - attr(*, "class")= chr "density"

note the original data isn't in there, so we can't get that and simply do something like dAB = density(c(dA$data, dB$data)).

The x and y components form the curve of the density, which you can plot with plot(dA$x, dA$y). You might think all you need to do is add the y values from two density objects but there's no guarantee they'll be at the same x points.

So maybe you think you can interpolate one to the same x points and then add the y values. But that won't integrate to 1 like a proper density ought to, so what you should do is scale dA$y and dB$y according to the fraction of points in each component density - which you can get from the dA$n component.

If you don't understand that last point, consider the following two densities, one from 1000 points and one from 10:

dA = density(runif(1000))
dB = density(runif(500)+10)

the first is a uniform between 0 and 1, the second a uniform between 10 and 11. The height of both uniforms is 1, and their ranges don't overlap, so if you added them you'd get two steps of equal height. But the density of their unions:

dAB = density(c(runif(1000), runif(500)+10))

is a density with twice as much mass between 0 and 1 than between 10 and 11. When adding densities taken from samples you need to weight by the sample size.

So if you can interpolate them to the same x values, and then sum the y values scaled according to the n values as weights, you can get something that would approximate density(c(A,B)).