Search code examples
rfiltergroup-bydplyrsummarize

Make a function to filter and summarize using R


I have these two tables;

   <A>                       <B>
a1    a2                     b1   
ABC   CAFE                   AB
ABD   DRINK                  BF
ABF   CAFE                   ..
ABFF  DRINK
..     ..

I would like to know the summarize table containing B to a1 in table A like this;

library(dplyr)
library(stringr)

A1 <- A %>%
filter(str_detect(a1, "AB")) %>%
group_by(a2) %>%
summarize(n())

A2 <- A %>%
filter(str_detect(a1, "BF")) %>%
group_by(a2) %>%
summarize(n())

However, I should make the code several times so that I would like to a function to input the B table in the str_detect function... How do I make the function?


Solution

  • Here I designed a function called count_fun, which has four arguments. dat is a data frame like A, Scol is a column with strings, Gcol is the grouping column, and String is the test string. See https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html to learn how to design a function using dplyr.

    library(dplyr)
    library(stringr)
    
    count_fun <- function(dat, Scol, Gcol, String){
    
      Scol <- enquo(Scol)
      Gcol <- enquo(Gcol)
    
      dat2 <- dat %>%
        filter(str_detect(!!Scol, String)) %>%
        group_by(!!Gcol) %>%
        summarize(n())
      return(dat2)
    }
    
    count_fun(A, a1, a2, "AB")
    # # A tibble: 2 x 2
    #   a2    `n()`
    #   <chr> <int>
    # 1 CAFE      2
    # 2 DRINK     2
    
    count_fun(A, a1, a2, "BF")
    # # A tibble: 2 x 2
    #   a2    `n()`
    #   <chr> <int>
    # 1 CAFE      1
    # 2 DRINK     1
    

    We can then apply count_fun using lapply to loop through every elements in B.

    lapply(B$b1, function(x){
      count_fun(A, a1, a2, x)
    })
    
    # [[1]]
    # # A tibble: 2 x 2
    #   a2    `n()`
    #   <chr> <int>
    # 1 CAFE      2
    # 2 DRINK     2
    # 
    # [[2]]
    # # A tibble: 2 x 2
    #   a2    `n()`
    #   <chr> <int>
    # 1 CAFE      1
    # 2 DRINK     1
    

    DATA

    A <- read.table(text = "a1    a2
    ABC   CAFE
                    ABD   DRINK 
                    ABF   CAFE
                    ABFF  DRINK
                    ",
                    header = TRUE, stringsAsFactors = FALSE)
    
    B <- data.frame(b1 = c("AB", "BF"), stringsAsFactors = FALSE)