Search code examples
rggplot2cdf

R ggplot: Weighted CDF


I'd like to plot a weighted CDF using ggplot. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.

For example, consider this data:

df <- data.frame(x=sort(runif(100)), w=1:100)

I can show an unweighted CDF with

ggplot(df, aes(x)) + stat_ecdf()

enter image description here

How would I weight this by w? For this example, I'd expect an x^2-looking function, since the larger numbers have higher weight.


Solution

  • There is a mistake in your answer.

    This is the right code to compute the weighted ECDF:

    df <- df[order(df$x), ]  # Won't change anything since it was created sorted
    df$cum.pct <- with(df, cumsum(w) / sum(w))
    ggplot(df, aes(x, cum.pct)) + geom_line()
    

    The ECDF is a function F(a) equal to the sum of weights (probabilities) of observations where x<a divided by the total sum of weights.

    But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted