Search code examples
rweibull

Predictions from a weibull fit don't match original data distribtuion


As a follow on from this question: Interpreting Weibull parameters from survreg, I'm trying to understand why histograms from predictions based on the model fit don't seem to match histograms of the original data. Example using code borrowed from that question:

library(survival)
y <- rweibull(1000, shape=2, scale=5)
r <- survreg(Surv(y)~1, dist="weibull")
a <- 1/r$scale      # Approximately 2
b <- exp( coef(r) ) # Approximately 5
y2 <- b * ( -log( 1-runif(1000) ) ) ^(1/a)
y3 <- rweibull(1000, shape=a, scale=5)

df2 <- data.frame(y,y2,y3)
df2 <- gather(df2)

ggplot(df2, aes(x = value, fill=key)) + geom_histogram()

The plot looks like this:

enter image description here

Why is the height reached on the y axis different for each y?


Solution

  • Use geom_histogram(position = "identity").