Search code examples
rpurrrsapply

Best method of applying multiple functions over similar structured df's in a list?


I have a list object called Profile_list combining multiple df's, all with the same columns (but different number of rows):

> summary(Profile_list)
            Length Class      Mode
Profile_19  26     data.frame list
Profile_20  26     data.frame list
Profile_21  26     data.frame list
Profile_40  26     data.frame list
Profile_41  26     data.frame list
Profile_84  26     data.frame list
Profile_92  26     data.frame list
Profile_95  26     data.frame list
Profile_98  26     data.frame list
Profile_106 26     data.frame list
Profile_135 26     data.frame list
Profile_139 26     data.frame list

I want to be able to apply the dplyr::select function to select columns Col_A and Col_B, then to find unique combinations of these two extracted columns of each df, then assign these results to a new list with the same names of the dfs, Profile_list_unique_indicators. What would be the best wayof achieving this?


Solution

  • Here a solution with purrr, and the use of map (as long as the names of the columns are the same across all data.frames):

    purrr::map(my_list, function(x) {
      x %>%  select(a, b) %>% group_by(a, b) %>% unique()
    })
    # [[1]]
    # # A tibble: 3 x 2
    # # Groups:   a, b [3]
    #       a     b
    # <dbl> <int>
    # 1     2     1
    # 2     2     2
    # 3     2     3
    # 
    # [[2]]
    # # A tibble: 3 x 2
    # # Groups:   a, b [3]
    #       a     b
    # <dbl> <int>
    # 1     1     4
    # 2     1     5
    # 3     1     6
    

    But I don't see the difference from simply use distinct:

    purrr::map(my_list, function(x) {
         x %>%  select(a, b) %>% distinct(a, b)
    })
    # [[1]]
    #   a b
    # 1 2 1
    # 2 2 2
    # 3 2 3
    # 
    # [[2]]
    #   a b
    # 1 1 4
    # 2 1 5
    # 3 1 6
    

    Fake data:

    data <- data.frame(a = rep(2, 4), b = rep(1:3, 4))
    data2 <- data.frame(a = rep(1, 4), b = rep(4:6, 4))
    
    my_list <- list(data, data2)
    my_list
    # [[1]]
    #    a b
    # 1  2 1
    # 2  2 2
    # 3  2 3
    # 4  2 1
    # 5  2 2
    # 6  2 3
    # 7  2 1
    # 8  2 2
    # 9  2 3
    # 10 2 1
    # 11 2 2
    # 12 2 3
    # 
    # [[2]]
    #    a b
    # 1  1 4
    # 2  1 5
    # 3  1 6
    # 4  1 4
    # 5  1 5
    # 6  1 6
    # 7  1 4
    # 8  1 5
    # 9  1 6
    # 10 1 4
    # 11 1 5
    # 12 1 6