Search code examples
racrossrowwiserowsum

How to calculate row sums in R with dynamic columns


I have the Dataframe:

df <- tibble(value_2024 = runif(4), 
value_2023 = runif(4), 
value_2022 = runif(4), 
selection = c(2023, 2022, 2024, 2022))

I want to rowwise calculate the sum of the columns value_2022 : value_n, where n is the value in column selection.

I did:

df %>% mutate(value_sum = rowSums(
across(value_2022:ends_with(as.character(df$selection)))
))

This of course gives the warning:

numerical expression has 4 elements: only the first used.

I do understand the warning and also know why it occurs but I don't know how I can solve it.


Solution

  • Update

    It is also possible to make it without rowwise, but you probably need more efforts, e.g.,

    df %>%
        mutate(rid = 1:n()) %>%
        pivot_longer(-c(rid, selection)) %>%
        reframe(
            value_sum = sum(value[1:n() %in% which(name == "value_2022"):which(name == str_c("value_", selection))]),
            .by = c(selection, rid)
        ) %>%
        arrange(rid) %>%
        {
            bind_cols(df, select(., value_sum))
        }
    

    which gives

    # A tibble: 4 × 5
      value_2024 value_2023 value_2022 selection value_sum
           <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
    1      0.897      0.908     0.661       2023     1.57
    2      0.266      0.202     0.629       2022     0.629
    3      0.372      0.898     0.0618      2024     1.33
    4      0.573      0.945     0.206       2022     0.206
    

    Maybe you can try rowwise

    df %>%
        rowwise() %>%
        mutate(value_sum = rowSums(across(value_2022:ends_with(as.character(.$selection)[cur_group_id()]))))
    

    which gives

    # A tibble: 4 × 5
    # Rowwise: 
      value_2024 value_2023 value_2022 selection value_sum
           <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
    1      0.897      0.908     0.661       2023     1.57
    2      0.266      0.202     0.629       2022     0.629
    3      0.372      0.898     0.0618      2024     1.33
    4      0.573      0.945     0.206       2022     0.206
    

    data

    set.seed(0)
    df <- tibble(value_2024 = runif(4), value_2023 = runif(4), value_2022 = runif(4), selection = c(2023, 2022, 2024, 2022))