Search code examples
rdataframedplyrpivot

How to calculate RSE_Var from SE_var/mean_Var row-wise for many variables, Var, using pivot() in R?


I have a dataset that has the weighted mean and weighted standard error of many variables.

input.ds.wt = tibble(
  id = c(1,2,3,4,5,6),
  wt.mean_vOne = c(1, 1, 1.3, 2.3, 1, 0),
  wt.mean_vTwo = rep(c(0.8,0.2), 3),
  wt.SE_vOne = c(0.1,0.01,0.2,0.02,0.3,0.03),
  wt.SE_vTwo = c(0.03,0.3,0.01,0.1,0.4,0.04)
)

> input.ds.wt
# A tibble: 6 x 5
     id wt.mean_vOne wt.mean_vTwo wt.SE_vOne wt.SE_vTwo
  <dbl>        <dbl>        <dbl>      <dbl>      <dbl>
1     1          1            0.8       0.1        0.03
2     2          1            0.2       0.01       0.3 
3     3          1.3          0.8       0.2        0.01
4     4          2.3          0.2       0.02       0.1 
5     5          1            0.8       0.3        0.4 
6     6          0            0.2       0.03       0.04

I think that using some combination of pivot_longer() followed by pivot_wider() will help me calculate RSE_vOne, RSE_vTwo, when I have many more variables than vOne, vTwo. However, I am open to any flexible solution.

What I want is to calculate:

RSE_vOne = wt.SE_vOne/wt.mean_vOne
RSE_vTwo  = wt.SE_vTwo/wt.mean_vTwo
...
RSE_vN = wt.SE_vN/wt.mean_vN

for each row of input.ds.wt, but without specifying the defintion for each vOne, vTwo, ..., etc. In this dataset every vOne, vTwo, ..., vN has a corresponding wt.mean_vN and wt.SE_vN.


Solution

  • I don't think you need double pivots, but splitting and collapsing the paired columns should be enough.

    Probably you can try this

    library(dplyr)
    
    input.ds.wt %>%
        cbind({
            .
        } %>%
            select(!id) %>%
            split.default(str_c("RSE_", sub(".*_", "", names(.)))) %>%
            map_dfc(\(x) x[[2]] / x[[1]]))
    

    where

    • . in {.} is the placeholder for input.ds.wt
    • sub(".*_","",names(.)) replace the substring that matches the pattern ".*_", e.g., "wt.mean_" or "wt.SE_", with empty ""
    • split.default splits the dataframe by columns, according to the above grouping patterns in terms of column names
    • x in map_dfc denotes each paired columns, e.g., wt.mean_vOne (x[[1]]) and wt.SE_vOne (x[[2]])

    and you will obtain

      id wt.mean_vOne wt.mean_vTwo wt.SE_vOne wt.SE_vTwo    RSE_vOne RSE_vTwo
    1  1          1.0          0.8       0.10       0.03 0.100000000   0.0375
    2  2          1.0          0.2       0.01       0.30 0.010000000   1.5000
    3  3          1.3          0.8       0.20       0.01 0.153846154   0.0125
    4  4          2.3          0.2       0.02       0.10 0.008695652   0.5000
    5  5          1.0          0.8       0.30       0.40 0.300000000   0.5000
    6  6          0.0          0.2       0.03       0.04         Inf   0.2000