Search code examples
rdatatable

Function that uses nonstandard indirect reference


I need a function f(B,A) that, given a dataset with the following structure,

T1 T2 T3 T4 T5  ... P1 P2 P3 P4 P5 ...
 1  2  5  8  9  ...  A  C  B  B  A ... 
 1  3  4  6  6  ...  C  A  C  A  B ...

finds the first time B and A appear in Pj columns (starting with j=1) and returns the value difference in the corresponding Ti columns. For instance:

  • in line 1: B appears in P3 first, A appears in P1 first. Then:

f(B, A) = T3 - T1 = 5-1 = 4

  • in line 2: B appears in P5 first, A appears in P2 first. Then:

f(B, A) = T5 - T2 = 6-3 = 3

I can find in which Pj columns B and A appear using str_detect() , but I don't know how to "move" from P_j1, P_j2 to T_j1, T_j2.

Using datatable syntax (or base R) will be appreciated


Solution

  • base R

    f <- function(l1, l2){
      apply(df, 1, function(x){
        dfP <- x[grepl("P", names(x))]
        dfT <- x[grepl("T", names(x))]
        
        as.numeric(dfT[which(dfP == l1)[1]]) - as.numeric(dfT[which(dfP == l2)[1]])
      })
    }
    
    f("B", "A")
    [1] 4 3
    

    Tidyverse

    With this type of data, it's usually best to pivot to long and then back to wide: here is a tidyverse solution, with diff being the desired output.

    library(tidyverse)
    
    df %>% 
      mutate(id = row_number()) %>% 
      pivot_longer(-id, names_pattern = "(\\D)(\\d)", 
                   names_to = c(".value", "group")) %>% 
      group_by(id) %>% 
      mutate(diff = first(T[P == "B"]) - first(T[P == "A"])) %>% 
      pivot_wider(c(id, diff), names_from = group, values_from = c(T, P), names_sep = "")
    

    output

         id  diff    T1    T2    T3    T4    T5 P1    P2    P3    P4    P5   
      <int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr> <chr>
    1     1     4     1     2     5     8     9 A     C     B     B     A    
    2     2     3     1     3     4     6     6 C     A     C     A     B