Vectorized sum of varying length vectors

I'm trying to simulate y observations from the following equation :

$y=\beta\sum_{i=1}^x\delta_i$

for all possible values of X and random values for the parameter Beta and the vector of parameters Delta.

Using R and packages from the tidyverse, this is what I got so far :

library(tidyverse)
library(extraDistr)
n <- 100
x_max <- 4

set.seed(1)
data <- 
    tibble(
        id = 1:100,
        beta = rnorm(100, mean = 0, sd = 1/(x_max - 1)),
        delta = cbind(0, rdirichlet(n, rep(1, x_max - 1)))
    ) %>% 
    expand_grid(x = 1:x_max)

data

# A tibble: 400 × 4
      id    beta delta[,1]   [,2]    [,3]   [,4]     x
   <int>   <dbl>     <dbl>  <dbl>   <dbl>  <dbl> <int>
 1     1 -0.209          0 0.169  0.170   0.661      1
 2     1 -0.209          0 0.169  0.170   0.661      2
 3     1 -0.209          0 0.169  0.170   0.661      3
 4     1 -0.209          0 0.169  0.170   0.661      4
 5     2  0.0612         0 0.0962 0.00297 0.901      1
 6     2  0.0612         0 0.0962 0.00297 0.901      2
 7     2  0.0612         0 0.0962 0.00297 0.901      3
 8     2  0.0612         0 0.0962 0.00297 0.901      4
 9     3 -0.279          0 0.241  0.714   0.0451     1
10     3 -0.279          0 0.241  0.714   0.0451     2
# ℹ 390 more rows
# ℹ Use `print(n = ...)` to see more rows

I'm stuck at the part of computing the y values from the formula above. The challenge is to use x as an indexing variable while computing the sums of delta parameters. What I'm looking for is something like this :

data %>% 
    rowwise() %>% 
    mutate(
        y = beta * sum(c_across(delta[,1]:delta[,x]))
    )

However, this code does not work and I get the following error message :

Error in `mutate()`:
ℹ In argument: `y = beta * sum(c_across(delta[, 1]:delta[, x]))`.
ℹ In row 1.
Caused by error in `c_across()`:
! Problem while evaluating `delta[, 1]`.
Caused by error:
! object 'delta' not found

I know there must be some problem with the nested matrix delta. However, I still feel like dplyr should be able to support vectorized indexing operations no?

From the example above, the result I expect to get is this :

# A tibble: 400 × 5
# Rowwise: 
      id    beta delta[,1]   [,2]    [,3]   [,4]     x        y
   <int>   <dbl>     <dbl>  <dbl>   <dbl>  <dbl> <int>    <dbl>
 1     1 -0.209          0 0.169  0.170   0.661      1  0      
 2     1 -0.209          0 0.169  0.170   0.661      2 -0.0352 
 3     1 -0.209          0 0.169  0.170   0.661      3 -0.0708 
 4     1 -0.209          0 0.169  0.170   0.661      4 -0.209  
 5     2  0.0612         0 0.0962 0.00297 0.901      1  0      
 6     2  0.0612         0 0.0962 0.00297 0.901      2  0.00589
 7     2  0.0612         0 0.0962 0.00297 0.901      3  0.00607
 8     2  0.0612         0 0.0962 0.00297 0.901      4  0.0612 
 9     3 -0.279          0 0.241  0.714   0.0451     1  0      
10     3 -0.279          0 0.241  0.714   0.0451     2 -0.0671 
# ℹ 390 more rows
# ℹ Use `print(n = ...)` to see more rows

Solution

Is this close to what you're looking for?

library(tidyverse)
library(extraDistr)

set.seed(1)

n <- 100
x_max <- 4

data <- 
  tibble(
    id = 1:n,
    beta = rnorm(n, mean = 0, sd = 1/(x_max - 1)),
    delta = cbind(0, rdirichlet(n, rep(1, x_max - 1)))
  ) %>% 
  expand_grid(x = 1:x_max)

data %>% 
  rowwise() %>% 
  mutate(
    y = beta * sum(delta[,1:x])
  )
#> # A tibble: 400 × 5
#> # Rowwise: 
#>       id    beta delta[,1]   [,2]    [,3]   [,4]     x        y
#>    <int>   <dbl>     <dbl>  <dbl>   <dbl>  <dbl> <int>    <dbl>
#>  1     1 -0.209          0 0.169  0.170   0.661      1  0      
#>  2     1 -0.209          0 0.169  0.170   0.661      2 -0.0352 
#>  3     1 -0.209          0 0.169  0.170   0.661      3 -0.0708 
#>  4     1 -0.209          0 0.169  0.170   0.661      4 -0.209  
#>  5     2  0.0612         0 0.0962 0.00297 0.901      1  0      
#>  6     2  0.0612         0 0.0962 0.00297 0.901      2  0.00589
#>  7     2  0.0612         0 0.0962 0.00297 0.901      3  0.00607
#>  8     2  0.0612         0 0.0962 0.00297 0.901      4  0.0612 
#>  9     3 -0.279          0 0.241  0.714   0.0451     1  0      
#> 10     3 -0.279          0 0.241  0.714   0.0451     2 -0.0671 
#> # ℹ 390 more rows

I think the confusion that c_across is having is that delta is technically behaving as one (matrix) column, so can't be summed 'across':

str(data)
#> tibble [400 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ id   : int [1:400] 1 1 1 1 2 2 2 2 3 3 ...
#>  $ beta : num [1:400] -0.2088 -0.2088 -0.2088 -0.2088 0.0612 ...
#>  $ delta: num [1:400, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ x    : int [1:400] 1 2 3 4 1 2 3 4 1 2 ...