Search code examples
rstatisticsprobability

Calculate log probability, darts game R


I am having trouble with the following problem, I have done some research but I still cannot come up with any solution to this problem.
Darts Player shoots 30 times every night for a period of 42 days. Create a function which takes the probability p of shooting the target and calculates the log of probability that the player has done the following shoots of the target for each of the 42 days:

shots = c(
  8, 5, 12, 11, 12, 8, 6, 7, 11, 7, 11, 13, 15,
  12, 17, 12, 9, 15, 8, 11, 11, 13, 10, 8, 12, 12, 11,
  13, 12, 14, 9, 11, 13, 10, 10, 12, 13, 10, 15, 12, 15, 12
)

I am new to probability and this type of programming in R, so any help and approach to solving this problem would be appreciated. Thank you in advance!


Solution

  • The probability of getting 8 shots or less given a hit probability of 0.5 can be found with:

    pbinom(8, 30, 0.5)
    

    But to find the probability of exactly 8 shots, we need to subtract the probability of getting 7 shots or less:

    pbinom(8, 30, 0.5) - pbinom(8 - 1, 30, 0.5)
    

    Since pbinom is vectorized, we can get the independent probabilities of getting all the shots with:

    pbinom(shots, 30, 0.5) - pbinom(shots - 1, 30, 0.5)
    

    But this gives us a vector of 42 probabilities. To get the probability of getting exactly this string of shots, we need to multiply all these probabilities together:

    prod(pbinom(shots, 30, 0.5) - pbinom(shots - 1, 30, 0.5))
    #> [1] 2.921801e-62
    

    And the log of this value is what we're looking for:

    log(prod(pbinom(shots, 30, 0.5) - pbinom(shots - 1, 30, 0.5)))
    #> [1] -141.6881
    

    Note though that we might run into problems with floating point numbers being unable to handle very small numbers, so it is safer to take the sum of the logs rather than the log of the product, which is otherwise mathematically equivalent.

    sum(log(pbinom(shots, 30, 0.5) - pbinom(shots - 1, 30, 0.5)))
    #> [1] -141.6881
    

    Now all we need to do is wrap this in a function which allows us to specify a number other than 0.5 for probability:

    f <- function(p) {
      shots = c(
      8, 5, 12, 11, 12, 8, 6, 7, 11, 7, 11, 13, 15,
      12, 17, 12, 9, 15, 8, 11, 11, 13, 10, 8, 12, 12, 11,
      13, 12, 14, 9, 11, 13, 10, 10, 12, 13, 10, 15, 12, 15, 12
      )
      sum(log(pbinom(shots, 30, p) - pbinom(shots - 1, 30, p)))
    }
    

    The reason you are being asked this question is probably as an introduction to likelihood. We can see the likelihood curve of the p parameter by plotting the log probability of getting exactly shots given a particular value of p

    probs <- seq(0.01, 0.99, 0.01)
    plot(probs, sapply(probs, f))
    

    enter image description here

    We can find the value of p with the greatest likelihood by using optimize:

    optimize(f, c(0.01, 0.99), maximum = TRUE)$maximum
    #> [1] 0.3714248
    

    So we can infer that the player had approximately 37.14% chance of hitting the target each time.

    We can confirm this is right by simply calculating the percentage of throws the dart player made, which should give us the same value:

    mean(shots/30)
    #> [1] 0.3714286