Search code examples
rloopsmachine-learningmarkov-chainsmcmc

R: How to generate the following data


I need help with generating random numbers.

p - The probability that a pcr test of a sick person, will show us a positive result. q - The probability that a pcr test of a healthy person, will show us a positive result. s - The general percentage of patients in the population.

Let q = 10%, p = 95% and s = 20%

With this information, I need to generate data for 1000 people who came to be tested.

What I tried to do?

s = 0.2
p = 0.95
q = 0.1

true_result = c()
test_result = c()

for (i in 1:1000){
  x = runif(1,0,1)
  if (x<=s){
    true_result = append(true_result,1)
    r = runif(1,0,1)
    if (r<=p){
      test_result = append(test_result,1)
  
    }else{
      test_result = append(test_result,0)
    }
    
  }else{
    true_result = append(true_result,0)
    r = runif(1,0,1)
    if (r<=q){
      test_result = append(test_result,1)
    }
    else{
      test_result = append(test_result,0)
    }
  }
}



sick = sum(true_result)
tested_sick = sum(test_result)

print(sick)
print(tested_sick)

I'm not sure if this is true and I wanted to know maybe there is another way to do it?

Another question, can it be presented as a binomial distribution?


Solution

  • How about something like this:

    # Sick and positive: .2*.95 = .19
    # Sick and negative: .2*.05 = .01
    # Healthy and positive: .8*.1=.08
    # Healthy and negative: .8*.9 = .72
    
    samp <- sample(1:4, 1000, replace=TRUE, prob=c(.19,.01,.08,.72))
    
    stat <- c("sick", "sick", "healthy", "healthy")
    tst <- c("positive", "negative", "positive", "negative")
    test_data <- data.frame(
      health = stat[samp], 
      test_result = tst[samp]
    )
    head(test_data)
    #>    health test_result
    #> 1 healthy    negative
    #> 2 healthy    negative
    #> 3 healthy    negative
    #> 4    sick    positive
    #> 5    sick    positive
    #> 6 healthy    negative
    

    Created on 2022-06-25 by the reprex package (v2.0.1)