Search code examples
rdataframeconditional-statementsapplydata-manipulation

Adding a conditional statement into an 'apply' function in R


I have built a function with apply and outer that is working great. However, one issue is that I cannot figure out how to add a condition statement without breaking it.

The function currently is:

df$match <- apply(outer(df$c2, df$c2, function(x, y) abs(x - y) <  1e7) &
                       diag(nrow(df)) == 0, 
                       MARGIN = 1,
                       function(x) paste(df$c3[x], collapse = ", "))

Basically, it looks to see if a numeric value in c2 of one row within a certain threshold (1e7) of any other row in df and then tells me which row(s) it matches by returned a specific row ID I have stored in c3. However, I want to modify this to only compare rows with the same numeric value in another column: c1.

Here is what the data looks like:

c1     c2         c3                 match
1      52426577   chr1.52426577_T    chr2.43668478_G, chr2.47207905_G, chr2.47959251_C, chr2.49606475_C
1      108023890  chr1.108023890_A   
1      129776943  chr1.129776943_T   
2      39943710   chr2.39943710_C    chr2.43668478_G, chr2.47207905_G, chr2.47959251_C, 
2      43668478   chr2.43668478_G    chr1.52426577_T, chr2.39943710_C, chr2.47207905_G, chr2.47959251_C
2      47207905   chr2.47207905_G    chr1.52426577_T, chr2.39943710_C, chr2.43668478_G, chr2.47959251_C
2      47959251   chr2.47959251_C    chr1.52426577_T, chr2.39943710_C, chr2.43668478_G, chr2.47207905_G

I've tried this but it doesn't work. The error I get is: attempt to apply non-function

df$match <- apply(outer(df$c2, df$c2, function(x, y) if(df$c1(x) == df$c1(y)) abs(x - y) <  1e7) &
                       diag(nrow(df)) == 0, 
                       MARGIN = 1,
                       function(x) paste(df$c3[x], collapse = ", "))

This is my desired result:

c1     c2         c3                 match
1      52426577   chr1.52426577_T    
1      108023890  chr1.108023890_A   
1      129776943  chr1.129776943_T   
2      39943710   chr2.39943710_C    chr2.43668478_G, chr2.47207905_G, chr2.47959251_C 
2      43668478   chr2.43668478_G    chr2.39943710_C, chr2.47207905_G, chr2.47959251_C
2      47207905   chr2.47207905_G    chr2.39943710_C, chr2.43668478_G, chr2.47959251_C
2      47959251   chr2.47959251_C    chr2.39943710_C, chr2.43668478_G, chr2.47207905_G

I've also tried grouping in dplyr to no avail:

df$match <- sapply(df$c3, function(x){
  
  df %>%
    group_by(c1) %>%
    filter(abs(c2 - c2[c3 == x]) < 1e7,
           c3 != x) %>%
    pull(c3) %>%
    paste0(collapse = ',')
  
})

Error in `filter()`:
! Problem while computing `..1 = abs(d2 - d2[d3 == x]) < 1e+07`.
✖ Input `..1` must be of size 17 or 1, not size 0.
ℹ The error occurred in group 2: d1 = 2.

Solution

  • You can apply your function by group using dplyr:

    library(dplyr)
    
    dat %>%
      group_by(c1) %>%
      mutate(match = apply(`diag<-`(abs(outer(c2, c2, `-`)) < 1e7, FALSE), 1, \(x) toString(c3[x]))) %>%
      ungroup()
    
    # A tibble: 7 × 4
         c1        c2 c3               match                                              
      <dbl>     <dbl> <chr>            <chr>                                              
    1     1  52426577 chr1.52426577_T  ""                                                 
    2     1 108023890 chr1.108023890_A ""                                                 
    3     1 129776943 chr1.129776943_T ""                                                 
    4     2  39943710 chr2.39943710_C  "chr2.43668478_G, chr2.47207905_G, chr2.47959251_C"
    5     2  43668478 chr2.43668478_G  "chr2.39943710_C, chr2.47207905_G, chr2.47959251_C"
    6     2  47207905 chr2.47207905_G  "chr2.39943710_C, chr2.43668478_G, chr2.47959251_C"
    7     2  47959251 chr2.47959251_C  "chr2.39943710_C, chr2.43668478_G, chr2.47207905_G"
    

    Or sticking with base R:

    transform(dat, match = unlist(lapply(split(dat, ~ c1), \(g) with(
      g, apply(`diag<-`(abs(outer(
        c2, c2, `-`
      )) < 1e7, FALSE), 1, \(x) toString(c3[x]))
    ))))
    
       c1        c2               c3                                             match
    11  1  52426577  chr1.52426577_T                                                  
    12  1 108023890 chr1.108023890_A                                                  
    13  1 129776943 chr1.129776943_T                                                  
    21  2  39943710  chr2.39943710_C chr2.43668478_G, chr2.47207905_G, chr2.47959251_C
    22  2  43668478  chr2.43668478_G chr2.39943710_C, chr2.47207905_G, chr2.47959251_C
    23  2  47207905  chr2.47207905_G chr2.39943710_C, chr2.43668478_G, chr2.47959251_C
    24  2  47959251  chr2.47959251_C chr2.39943710_C, chr2.43668478_G, chr2.47207905_G