Search code examples
rggplot2statisticsprobability-densitybeta-distribution

Plot beta distribution in R


Using the dataset Lahman::Batting I've estimated parameters for the beta distribution. Now I want to plot this empirically derived beta distribution onto the histogram that I estimated it from.

library(dplyr)
library(tidyr)
library(Lahman)

career <- Batting %>%
  filter(AB > 0) %>%
  anti_join(Pitching, by = "playerID") %>%
  group_by(playerID) %>%
  summarize(H = sum(H), AB = sum(AB)) %>%
  mutate(average = H / AB)

I can plot the distribution of RBI as:

career %>% 
  filter(AB > 500) %>% 
  ggplot(aes(x = average)) +
  geom_histogram() +
  geom_freqpoly(color = "red")

And obtain:

enter image description here

I know I can use + geom_freqpoly to obtain:

enter image description here

But I want the smooth beta distribution. I can estimate beta parameters by:

career_filtered <- career %>%
    filter(AB >= 500)

m <- MASS::fitdistr(career_filtered$average, dbeta,
                    start = list(shape1 = 1, shape2 = 10))

alpha0 <- m$estimate[1] # parameter 1
beta0 <- m$estimate[2] # parameter 2

Now that I have parameters alpha0 and beta0, how do I plot the beta distribution so that I obtain something like this:

enter image description here

This question is based on a post I'm reading here.


Solution

  • All code, including the code for the plots, can be found here. The following code is used to get the requested plot:

    ggplot(career_filtered) +
      geom_histogram(aes(average, y = ..density..), binwidth = .005) +
      stat_function(fun = function(x) dbeta(x, alpha0, beta0), color = "red",
                    size = 1) +
      xlab("Batting average")
    

    Hope this helps.