I am trying to make a function in order to generate n
random numbers from negative binomial distribution.
To generate it, I first made a function to generate n
random variables from geometric distribution. My function for generating n
random numbers from geometric distribution as follows:
rGE<-function(n,p){
I<-rep(NA,n)
for (j in 1:n){
x<-rBer(1,p)
i<-1 # number of trials
while(x==0){
x<-rBer(1,p)
i<-i+1
}
I[j]<- i
}
return(I)
}
I tested this function (rGE
), for example for rGE(10,0.5)
, which is generating 10
random numbers from a geometric distribution with probability of success 0.5, a random result was:
[1] 2 4 2 1 1 3 4 2 3 3
In rGE
function I used a function named rBer
which is:
rBer<-function(n,p){
sample(0:1,n,replace = TRUE,prob=c(1-p,p))
}
Now, I want to improve my above function (rGE
) in order to make a function for generating n
random numbers from a negative binomial function. I made the following function:
rNB<-function(n,r,p){
I<-seq(n)
for (j in 1:n){
x<-0
x<-rBer(1,p)
i<-1 # number of trials
while(x==0 & I[j]!=r){
x<-rBer(1,p)
i<-i+1
}
I[j]<- i
}
return(I)
}
I tested it for rNB(3,2,0.1)
, which generates 3 random numbers from a negative binomial distribution with parametrs r=2
and p=0.1
for several times:
> rNB(3,2,0.1)
[1] 2 1 7
> rNB(3,2,0.1)
[1] 3 1 4
> rNB(3,2,0.1)
[1] 3 1 2
> rNB(3,2,0.1)
[1] 3 1 3
> rNB(3,2,0.1)
[1] 46 1 13
As you can see, I think my function (rNB
) does not work correctly, because the results always generat 1
for the second random number.
Could anyone help me to correct my function (rNB
) in order to generate n
random numbers from a negative binomial distribution with parametrs n
, r
, and p
. Where r
is the number of successes and p
is the probability of success?
[[Hint: Explanations regarding geometric distribution and negative binomial distribution: Geometric distribution: In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:
Negative binomial distribution:A negative binomial experiment is a statistical experiment that has the following properties: The experiment consists of x repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. The experiment continues until r successes are observed, where r is specified in advance. ]]
Your function will be much faster if you use R's native vectorization. The way you can do this is to generate all your Bernoulli trials at once.
Note that for a negative binomial distribution, the expected value (i.e. the mean number of Bernoulli trials it will take to get r
successes) is r * p / (1 - p)
(Reference)
If we want to draw n
negative binomial samples, then the expected total number of Bernoulli trials will therefore be n * r * p / (1 - p)
. So we want to draw at least that many Bernoulli samples. For simplicity, we can start by drawing twice that number: 2 * n * r * p / (1 - p)
. In the unlikely case that this is not enough, we can draw twice as many again repeatedly until we have enough; once the sum of the resultant vector of Bernoulli trials is greater than r * n
, we know we have enough Bernoulli trials to simulate our n
negative binomial trials.
We can now run a cumsum
on the vector of Bernoulli trials to keep track of the number of positive trials. If you then perform integer division on this vector by %/% r
, you will have all the Bernoulli trials labelled according to which negative binomial trial they belonged to. You then table
this vector.
The first r
numbers of the table (obtained by subsetting the table by [1:n]
or equivalently by [seq(n)]
is your negative binomial draw. We just remove the table's names by using as.numeric
. We also subtract the number of successes (i.e. r
), from each of our counts, since we are only counting the failures, not the successes.
rNB <- function(n, r, p) {
mult <- 2
all_samples <- 0
while(sum(all_samples) < n * r)
{
all_samples <- rBer(mult * n * r * p / (1 - p), p)
mult <- mult * 2
}
as.numeric(table(cumsum(all_samples) %/% r))[seq(n)] - r
}
So we can do:
rNB(3, 2, 0.1)
#> [1] 14 19 41
rNB(3, 2, 0.1)
#> [1] 23 6 56
rNB(3, 2, 0.1)
#> [1] 11 31 59
rNB(3, 2, 0.1)
#> [1] 7 21 14
mean(rNB(10000, 2, 0.1))
#> [1] 18.0002
We can test this against R's own rnbinom
:
mean(rnbinom(10000, 2, 0.1))
#> [1] 18.0919
hist(rnbinom(10000, 2, 0.5), breaks = 0:20)
hist(rNB(10000, 2, 0.5), breaks = 0:20)
Note that the logic of your own version isn't quite right. In particular, the line while(x == 0 & I[j] != r)
doesn't make any sense. I
is a vector of 1:n
, so in your example, whenever j
is 2, I[j]
is equal to r
and the loop stops. This is why your second number is always 1. I don't know what you were trying to do here.
If you want to do it one Bernoulli trial at a time, as you are doing in your own version, try this modified function. The variable names should hopefully make it easy to follow the logic:
rNB <- function(n, r, p) {
# Create an empty vector of length n for our results
draws <- numeric(n)
# Now for each of the n trials we will get a negative binomial sample:
for (i in 1:n) {
# Create success and failure counters for this draw
failures <- successes <- 0
# Now run Bernoulli trials, counting successes and failures as we go
# until we hit r successes
while(successes < r)
{
if(rBer(1, p) == 1)
successes <- successes + 1
else
failures <- failures + 1
}
# Once we have reached r successes, the current number of failures is our
# negative binomial draw
draws[i] <- failures
}
return(draws)
}
This gives identical results to the faster, albeit more opaque, vectorized version.