Search code examples
rprobability-densitydensity-plot

R cdplot() - does the right axis show probability or density?


Reproducible data:

## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                 1, 2, 1, 1, 1, 1, 1),
               levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

## CD plot
cdplot(fail ~ temperature)

The documentation for cdplot says:

cdplot computes the conditional densities of x given the levels of y weighted by the marginal distribution of y. The densities are derived cumulatively over the levels of y. The conditional probabilities are not derived by discretization (as in the spinogram), but using a smoothing approach via density.The conditional density functions (cumulative over the levels of y) are returned invisibly.

So on the plot where x = 63, y = 0.4 (approximately). Is this probability, or probability density? I am confused by the documentation as to what is calculated, what is returned and what is plotted.


Solution

  • The plot shows the probability of an outcome for a given temperature.

    What the docs are saying is that a standard density distribution is calculated for temperature measurements, and a density is worked out separately for temperature when fail is 'no'. If we divide the density of "no" temperatures by the density of all temperatures, then weight this by the proportion of 'no' temperatures, then we will get an estimate of the probability of drawing a "no" at a given temperature.

    To show this is the case, let's see the cdplot:

    cdplot(fail ~ temperature)
    

    enter image description here

    Now let's calculate the probabilities from the marginal densities manually and plot. We should get a near-identical shape to our curve

    all <- density(temperature, from = min(temperature), to = max(temperature))
    
    no  <- density(temperature[fail == "no"], from = min(temperature), 
                     to = max(temperature))
    
    probs <- no$y/all$y * proportions(table(fail))[1]
    
    plot(all$x, 1 - probs, type = "l", ylim = c(0, 1))
    

    enter image description here