I'm trying to simulate y observations from the following equation :
for all possible values of X and random values for the parameter Beta and the vector of parameters Delta.
Using R and packages from the tidyverse, this is what I got so far :
library(tidyverse)
library(extraDistr)
n <- 100
x_max <- 4
set.seed(1)
data <-
tibble(
id = 1:100,
beta = rnorm(100, mean = 0, sd = 1/(x_max - 1)),
delta = cbind(0, rdirichlet(n, rep(1, x_max - 1)))
) %>%
expand_grid(x = 1:x_max)
data
# A tibble: 400 × 4
id beta delta[,1] [,2] [,3] [,4] x
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 1 -0.209 0 0.169 0.170 0.661 1
2 1 -0.209 0 0.169 0.170 0.661 2
3 1 -0.209 0 0.169 0.170 0.661 3
4 1 -0.209 0 0.169 0.170 0.661 4
5 2 0.0612 0 0.0962 0.00297 0.901 1
6 2 0.0612 0 0.0962 0.00297 0.901 2
7 2 0.0612 0 0.0962 0.00297 0.901 3
8 2 0.0612 0 0.0962 0.00297 0.901 4
9 3 -0.279 0 0.241 0.714 0.0451 1
10 3 -0.279 0 0.241 0.714 0.0451 2
# ℹ 390 more rows
# ℹ Use `print(n = ...)` to see more rows
I'm stuck at the part of computing the y values from the formula above. The challenge is to use x as an indexing variable while computing the sums of delta parameters. What I'm looking for is something like this :
data %>%
rowwise() %>%
mutate(
y = beta * sum(c_across(delta[,1]:delta[,x]))
)
However, this code does not work and I get the following error message :
Error in `mutate()`:
ℹ In argument: `y = beta * sum(c_across(delta[, 1]:delta[, x]))`.
ℹ In row 1.
Caused by error in `c_across()`:
! Problem while evaluating `delta[, 1]`.
Caused by error:
! object 'delta' not found
I know there must be some problem with the nested matrix delta
. However, I still feel like dplyr
should be able to support vectorized indexing operations no?
From the example above, the result I expect to get is this :
# A tibble: 400 × 5
# Rowwise:
id beta delta[,1] [,2] [,3] [,4] x y
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 1 -0.209 0 0.169 0.170 0.661 1 0
2 1 -0.209 0 0.169 0.170 0.661 2 -0.0352
3 1 -0.209 0 0.169 0.170 0.661 3 -0.0708
4 1 -0.209 0 0.169 0.170 0.661 4 -0.209
5 2 0.0612 0 0.0962 0.00297 0.901 1 0
6 2 0.0612 0 0.0962 0.00297 0.901 2 0.00589
7 2 0.0612 0 0.0962 0.00297 0.901 3 0.00607
8 2 0.0612 0 0.0962 0.00297 0.901 4 0.0612
9 3 -0.279 0 0.241 0.714 0.0451 1 0
10 3 -0.279 0 0.241 0.714 0.0451 2 -0.0671
# ℹ 390 more rows
# ℹ Use `print(n = ...)` to see more rows
Is this close to what you're looking for?
library(tidyverse)
library(extraDistr)
set.seed(1)
n <- 100
x_max <- 4
data <-
tibble(
id = 1:n,
beta = rnorm(n, mean = 0, sd = 1/(x_max - 1)),
delta = cbind(0, rdirichlet(n, rep(1, x_max - 1)))
) %>%
expand_grid(x = 1:x_max)
data %>%
rowwise() %>%
mutate(
y = beta * sum(delta[,1:x])
)
#> # A tibble: 400 × 5
#> # Rowwise:
#> id beta delta[,1] [,2] [,3] [,4] x y
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 1 -0.209 0 0.169 0.170 0.661 1 0
#> 2 1 -0.209 0 0.169 0.170 0.661 2 -0.0352
#> 3 1 -0.209 0 0.169 0.170 0.661 3 -0.0708
#> 4 1 -0.209 0 0.169 0.170 0.661 4 -0.209
#> 5 2 0.0612 0 0.0962 0.00297 0.901 1 0
#> 6 2 0.0612 0 0.0962 0.00297 0.901 2 0.00589
#> 7 2 0.0612 0 0.0962 0.00297 0.901 3 0.00607
#> 8 2 0.0612 0 0.0962 0.00297 0.901 4 0.0612
#> 9 3 -0.279 0 0.241 0.714 0.0451 1 0
#> 10 3 -0.279 0 0.241 0.714 0.0451 2 -0.0671
#> # ℹ 390 more rows
I think the confusion that c_across
is having is that delta
is technically behaving as one (matrix) column, so can't be summed 'across':
str(data)
#> tibble [400 × 4] (S3: tbl_df/tbl/data.frame)
#> $ id : int [1:400] 1 1 1 1 2 2 2 2 3 3 ...
#> $ beta : num [1:400] -0.2088 -0.2088 -0.2088 -0.2088 0.0612 ...
#> $ delta: num [1:400, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
#> $ x : int [1:400] 1 2 3 4 1 2 3 4 1 2 ...