I have simulation in which i create 2 groups of 100 patients in each group. Dataframe contain patients, type of intervention (R for reference and T for test) and recovery status (1 or 0). Then I need to make an experiment: make confidence interval for probabilities difference. I need to repeat experiment for 1000 times (every experiment creat a new sample group of patients) and saving results (df_prop2) in every repeat. At the end I should get a dataframe (df_prop2) with 1000 rows (1000 obs. of 7 variables). I dont understand why using my code I'm getting dataframe with 1 row and 1007 columns. Please help
This is my code:
library(dplyr)
library(tidyr)
set.seed(123)
p_R <- 0.1 #true probability for reference
p_T <- 0.2 #true probabiluty for Test
sample_size <- 100 #I have 100 patients in 2 groups
n_rep <- 1000 #number of repeats
df_all_repeats <- data.frame(
n_exp = rep(1:n_rep, each = 2*sample_size),
arm = rep(c('R', 'T'), each = sample_size),
patient_ID = rep(1:sample_size, n_rep))
df_all_repeats <- df_all_repeats %>%
group_by(n_exp) %>%
mutate(recovery_status = c(sample(c(1,0), sample_size, replace = TRUE, prob = c(p_R, 1 - p_R)),
sample(c(1,0), sample_size, replace = TRUE, prob = c(p_T, 1 - p_T))))
#Experiment
df_prop2 <- df_all_repeats %>%
group_by(arm) %>%
group_by(n_exp) %>%
dplyr::summarise(x = sum(recovery_status),
n = n()) %>%
ungroup() %>%
dplyr::summarise(X = list(x), N = list(n)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(X, N)))) %>%
unnest(tst)
print(df_prop2)
Here's a solution which also simplifies the data generation:
library(dplyr)
library(tidyr)
library(broom)
p_R <- 0.1 # true probability for reference
p_T <- 0.2 # true probabiluty for Test
sample_size <- 100 # number of patients
n_rep <- 1000 # number of repeats
set.seed(123)
df_all_repeats <- expand.grid(
patient_id = 1:sample_size,
arm = c("R", "T"),
n_exp = 1:n_rep,
stringsAsFactors = FALSE) %>%
as_tibble() %>% ## data is sorted n_exp, arm, patient_id
mutate(recovery_status = c(sample(1:0, sample_size, replace = TRUE,
prob = c(p_R, 1 - p_R)),
sample(1:0, sample_size, replace = TRUE,
prob = c(p_T, 1 - p_T))),
.by = n_exp)
df_all_repeats %>%
group_by(n_exp, arm) %>%
summarise(x = sum(recovery_status),
n = n()) %>%
summarise(broom::tidy(prop.test(x, n)))
# # A tibble: 1,000 × 10
# n_exp estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 1 0.07 0.19 5.35 0.0207 1 -0.222 -0.0183 2-sample test for equality of proportions with continuity corre… two.sided
# 2 2 0.1 0.19 2.58 0.108 1 -0.197 0.0168 2-sample test for equality of proportions with continuity corre… two.sided
# 3 3 0.09 0.27 9.79 0.00175 1 -0.294 -0.0665 2-sample test for equality of proportions with continuity corre… two.sided
# 4 4 0.08 0.21 5.81 0.0160 1 -0.236 -0.0241 2-sample test for equality of proportions with continuity corre… two.sided
# 5 5 0.07 0.21 7.02 0.00807 1 -0.244 -0.0358 2-sample test for equality of proportions with continuity corre… two.sided
# 6 6 0.03 0.18 10.4 0.00124 1 -0.242 -0.0576 2-sample test for equality of proportions with continuity corre… two.sided
# 7 7 0.08 0.16 2.32 0.128 1 -0.179 0.0194 2-sample test for equality of proportions with continuity corre… two.sided
# 8 8 0.12 0.15 0.171 0.679 1 -0.135 0.0746 2-sample test for equality of proportions with continuity corre… two.sided
# 9 9 0.08 0.28 12.2 0.000471 1 -0.313 -0.0872 2-sample test for equality of proportions with continuity corre… two.sided
# 10 10 0.11 0.13 0.0473 0.828 1 -0.120 0.0800 2-sample test for equality of proportions with continuity corre… two.sided
# # ℹ 990 more rows
# # ℹ Use `print(n = ...)` to see more rows