Search code examples
rdplyrtidyversepurrrtidyeval

Creating a new mean column in a loop by a custom function


I wish to create a new column every time a grouped mean function is being called, for all the factor data types.

I am only able to replicate the decider result but only on a single factor variable A.

   df <- data.frame(
      target = c(1, 4, 8, 9, 2, 1, 3, 5, 7, 1),
      A = c("A", "Z", "N", "A", "Z"),
      B = c("B", "Q", "G", "B", "T"),
      C = c("C", "Y", "C", "P", "Y")
    )

grouped_mean <- function(data, summary_var, ...) {
  summary_var <- enquo(summary_var)

  data %>%
    # Selects only factor data types and a target column
    select(which(map_chr(., class) == "factor"), !!summary_var) %>%
    group_by(...) %>%
    # Over here I am not able to change column name, so that it yields Mean_A, Mean_B and Mean_C
    mutate(mean = mean(!!summary_var)) %>%
    ungroup()
}

grouped_mean(data = df, 
             group_var = A, 
             summary_var = target)

I tried looping it over:

map_df(df, grouped_mean(data = df, summary_var = target))

But I get this error:

Error: Can't convert a tbl_df/tbl/data.frame object to function

Questions and inputs:

  1. I am not sure how to make a function that dynamically changes name in a mutate function, from the name mean to mean_A, mean_B and mean_c
  2. I tried the map_df function to loop each element of the df, but unsuccessfully. The idea is to create new columns that are the means of the target feature.

Solution

  • Here is a bit of a quirky solution but it should work for you (assuming you are ok with specifying target as the column you want the mean of). This just uses mutate_if() and uses subsetting with tapply() to get your means.

    Then, it uses rename_at() to change the names to match your desired output. If you want it to be lowercase you can wrap gsub() with tolower()

    df %>%
      mutate_if(is.factor, list(Mean = ~tapply(df$target, ., mean)[.])) %>%
      rename_at(vars(ends_with("Mean")), ~gsub("(.*?)_(.*)", "\\2_\\1", .))
    
       target A B C Mean_A Mean_B Mean_C
    1       1 A B C    4.5    4.5   3.75
    2       4 Z Q Y    2.5    3.5   2.50
    3       8 N G C    6.5    6.5   3.75
    4       9 A B P    4.5    4.5   8.00
    5       2 Z T Y    2.5    1.5   2.50
    6       1 A B C    4.5    4.5   3.75
    7       3 Z Q Y    2.5    3.5   2.50
    8       5 N G C    6.5    6.5   3.75
    9       7 A B P    4.5    4.5   8.00
    10      1 Z T Y    2.5    1.5   2.50