Why does custom function using dplyr give a different result to without function wrap?

So I am writing a function to create a specifc number of duplicate rows, from something like this:

df1 <- tibble(
  Random_category = c(rep("A", 2), rep("B", 3), rep("C", 6)),
  ID = 1:11,
  Value = sample(1:100, 11, replace = TRUE)
)

   Random_category    ID Value
   <chr>           <int> <int>
 1 A                   1    92
 2 A                   2    11
 3 B                   3    42
 4 B                   4    33
 5 B                   5    93
 6 C                   6    79
 7 C                   7    82
 8 C                   8    46
 9 C                   9    77
10 C                  10    88
11 C                  11    58

To something like this:


Random_category    ID Value
<chr>           <int> <int>
 1 A                   2    60
 2 A                   2    60
 3 A                   1     8
 4 A                   2    60
 5 A                   1     8
 6 B                   3    31
 7 B                   4    13
 8 B                   4    13
 9 B                   5    91
10 B                   5    91
11 C                   6    19
12 C                   9    72
13 C                   7    26
14 C                  10    85
15 C                   8    67

My function looks like this:

duplicate_rows <- function(df, target_num_of_rows, group_name) {
  df %>%
    group_by({{group_name}}) %>%
    mutate(rows_to_duplicate = if_else(row_number() <= target_num_of_rows, ceiling(target_num_of_rows / n()), 0)) %>%
    slice(rep(row_number(), times = rows_to_duplicate)) %>%
    ungroup() %>%
    select(-rows_to_duplicate) %>%
    slice_sample(by = {{group_name}}, n = target_num_of_rows)
}

# Duplicate rows ensuring each group has exactly 5 rows
df_duplicated <- duplicate_rows(df1, 5, "Random_category")

But instead it gives me:

Random_category    ID Value `"Random_category"`
<chr>           <int> <int> <chr>
1 A                   2    60 Random_category
2 A                   1     8 Random_category
3 B                   3    31 Random_category
4 B                   4    13 Random_category
5 B                   5    91 Random_category

Even though I have taken the dplyr section out from the function and it works perfectly:

df1 %>%
  group_by(Random_category) %>%
  mutate(rows_to_duplicate = if_else(row_number() <= 5, ceiling(5 / n()), 0)) %>%
  slice(rep(row_number(), times = rows_to_duplicate)) %>%
  ungroup() %>%
  select(-rows_to_duplicate) %>%
  slice_sample(by = Random_category, n = 5)

I suspect it is something to do with the group name, but I don't understand why?

Solution

Use backticks instead of quotes.

duplicate_rows(df1, 5, `Random_category`)
# # A tibble: 15 × 3
#    Random_category    ID Value
#    <chr>           <int> <int>
#  1 A                   2    11
#  2 A                   1    92
#  3 A                   1    92
#  4 A                   2    11
#  5 A                   1    92
#  6 B                   4    33
#  7 B                   5    93
#  8 B                   3    42
#  9 B                   4    33
# 10 B                   5    93
# 11 C                   8    46
# 12 C                   9    77
# 13 C                   7    82
# 14 C                  10    88
# 15 C                   6    79

The use of {{..}} should be working on symbols, not strings, so we need to pass it something compatible.

FYI, if you want it to be able to accept strings instead,

duplicate_rows <- function(df, target_num_of_rows, group_name) {
  group_name <- sym(group_name)
  df %>%
    group_by({{group_name}}) %>%
    mutate(rows_to_duplicate = if_else(row_number() <= target_num_of_rows, ceiling(target_num_of_rows / n()), 0)) %>%
    slice(rep(row_number(), times = rows_to_duplicate)) %>%
    ungroup() %>%
    select(-rows_to_duplicate) %>%
    slice_sample(by = {{group_name}}, n = target_num_of_rows)
}
duplicate_rows(df1, 5, "Random_category")
# # A tibble: 15 × 3
#    Random_category    ID Value
#    <chr>           <int> <int>
#  1 A                   1    92
#  2 A                   1    92
#  3 A                   2    11
#  4 A                   1    92
#  5 A                   2    11
#  6 B                   5    93
#  7 B                   4    33
#  8 B                   4    33
#  9 B                   5    93
# 10 B                   3    42
# 11 C                   9    77
# 12 C                   8    46
# 13 C                   7    82
# 14 C                  10    88
# 15 C                   6    79

... but now the use of symbols does not work.

duplicate_rows(df1, 5, `Random_category`)
# Error in datamart_write(copy(allouts)[, `:=`(MyCar, paste0("c", MyCar))],  : 
#   object 'Random_category' not found

Choose whichever strategy makes the most sense to you.

Edit: @Onyambu suggested a way that handles both:

duplicate_rows <- function(df, target_num_of_rows, group_name) {
  group_name <- as.name(as.character(substitute(group_name)))
  df %>%
    group_by({{group_name}}) %>%
    mutate(rows_to_duplicate = if_else(row_number() <= target_num_of_rows, ceiling(target_num_of_rows / n()), 0)) %>%
    slice(rep(row_number(), times = rows_to_duplicate)) %>%
    ungroup() %>%
    select(-rows_to_duplicate) %>%
    slice_sample(by = {{group_name}}, n = target_num_of_rows)
}
duplicate_rows(df1, 5, "Random_category") # works
duplicate_rows(df1, 5, `Random_category`) # works

I like the fact that it is flexible, though I do believe that sometimes polymorphism can go too far. Not sure if this is one of those times ...