As a follow on from this question: Interpreting Weibull parameters from survreg, I'm trying to understand why histograms from predictions based on the model fit don't seem to match histograms of the original data. Example using code borrowed from that question:
library(survival)
y <- rweibull(1000, shape=2, scale=5)
r <- survreg(Surv(y)~1, dist="weibull")
a <- 1/r$scale # Approximately 2
b <- exp( coef(r) ) # Approximately 5
y2 <- b * ( -log( 1-runif(1000) ) ) ^(1/a)
y3 <- rweibull(1000, shape=a, scale=5)
df2 <- data.frame(y,y2,y3)
df2 <- gather(df2)
ggplot(df2, aes(x = value, fill=key)) + geom_histogram()
The plot looks like this:
Why is the height reached on the y axis different for each y?
Use geom_histogram(position = "identity")
.