Search code examples
rdataframedplyrtidyversepurrr

How to use map_if based on the name of each element in the list


suppose I have a list of dataframes as follows:

df1 <- data.frame(a1 = 1:5, a2 = 1:5, a3 = 1:5)
df2 <- data.frame(a1 = 1:3, a2 = 2:4, a3 = 3:5)
df3 <- data.frame(a1 = 10:20, a2 = 5:15)

l <- list(df1 = df1, df2 = df2, df3 = df3)

What should I do to perform operations (like mutate) on each element on the list conditioning on the elements name?

For instance - how would I proceed If I wanted to add some new column only if was dealing with df1 or df3 and wanted to delete some column if I was dealing with df2?

Could map_if deal with that?

PS: Keep in mind that the list would probably have more than 3 datasets so that possibly multiple conditions would be needed.


Solution

  • You can do this sort of operations with imap instead. Since you would like to do a certain operations based on names of the your list or names of the elements of the list you should use imap.

    .f argument in imap takes 2 arguments:

    • .x which is the first argument and represents the value
    • .y which is the second argument and represents the names of you arguments and in case they don't have names, it represents their positions

    So for example in this case .xs are your 3 data sets and .ys are their names df1:df3 or their positions 1:3.

    library(purrr)
    
    l %>%
      imap(~ if(.y %in% c("df1", "df3")) {
        .x %>% 
          mutate(a3 = a1 + a2)
      } else {
        .x <- .x[-3]
        .x
      })
    
    $df1
      a1 a2 a3
    1  1  1  2
    2  2  2  4
    3  3  3  6
    4  4  4  8
    5  5  5 10
    
    $df2
      a1 a2
    1  1  2
    2  2  3
    3  3  4
    
    $df3
       a1 a2 a3
    1  10  5 15
    2  11  6 17
    3  12  7 19
    4  13  8 21
    5  14  9 23
    6  15 10 25
    7  16 11 27
    8  17 12 29
    9  18 13 31
    10 19 14 33
    11 20 15 35
    

    But if you would like to apply a certain function on each of your elements that meets a certain condition then you can use map_if. For example we would like to add a4 column if the number of rows in each are greater than a certain number. Bear in mind that .p argument should return a single TRUE or FALSE:

    # This use case works 
    l %>%
      map_if(~ nrow(.x) > 3, ~ .x %>%
               mutate(a4 = a1 + a2))
    
    # But this doesn't becase names(.x) are actually column names of each element and the result is not what you are after
    l %>%
      map_if(~ names(.x) %in% c("df1", "df3"), ~ .x %>%
               mutate(a4 = a1 + a2))
    

    An equivalent to imap is map2 where the second argument is the names of each element (and not the column names of each element):

    l %>%
      map2(names(l), ~ if(.y %in% c("df1", "df3")) {
        .x %>% 
          mutate(a3 = a1 + a2)
      } else {
        .x <- .x[-3]
        .x
      })