bind_rows() creates duplicate of each dataframe when binding them in R

Assuming this is my dataframe:

df <- data.frame(grp = c("ab -10", "ab 0", "ab 8", "ab -1",
                         "ab 6", "ab 6", "ab -10", "ab 1",
                         "ab -10", "ab 0", "ab 8", "ab -1",
                         "ab 6", "ab 6", "ab -10", "ab 1",
                         "d", "e", "e", "e"),
                 freq = c(1,0,0,1,0,1,2,0,1,0,2,2,1,1,0,1,0,2,2,1))
df
      grp freq
1  ab -10    1
2    ab 0    0
3    ab 8    0
4   ab -1    1
5    ab 6    0
6    ab 6    1
7  ab -10    2
8    ab 1    0
9  ab -10    1
10   ab 0    0
11   ab 8    2
12  ab -1    2
13   ab 6    1
14   ab 6    1
15 ab -10    0
16   ab 1    1
17      d    0
18      e    2
19      e    2
20      e    1

I want to have:

> finaldf
     grp freq
1 ab < 0    7
2 ab 0-5    1
3  ab 5+    5
4      d    0
5      e    5

This is what I tried:

df %>%
  bind_rows(df %>%
              filter(!grepl("ab", grp)),
            
            df %>%
              filter(grepl("ab", grp)) %>%
              mutate(grp = parse_number(grp)) %>%
              mutate(grp = cut(as.numeric(grp),
                                          breaks = c(-999, 0, 6, 999),
                                          labels = c("ab < 0", "ab 0-5", "ab 5+"),
                                          right = F))) %>%
              group_by(grp) %>%
              summarise(N =n())

but it seems like bind_rows is duplicating dataframes.

      grp freq
1  ab -10    1
2    ab 0    0
3    ab 8    0
4   ab -1    1
5    ab 6    0
6    ab 6    1
7  ab -10    2
8    ab 1    0
9  ab -10    1
10   ab 0    0
11   ab 8    2
12  ab -1    2
13   ab 6    1
14   ab 6    1
15 ab -10    0
16   ab 1    1
17      d    0
18      e    2
19      e    2
20      e    1
21      d    0
22      e    2
23      e    2
24      e    1
25 ab < 0    1
26 ab 0-5    0
27  ab 5+    0
28 ab < 0    1
29  ab 5+    0
30  ab 5+    1
31 ab < 0    2
32 ab 0-5    0
33 ab < 0    1
34 ab 0-5    0
35  ab 5+    2
36 ab < 0    2
37  ab 5+    1
38  ab 5+    1
39 ab < 0    0
40 ab 0-5    1

I can slice() half of the rows, but I m more interested in knowing what I am doing wrong?

Any other neat and pretty approach is also highly appreciated!

Solution

Here is one method where will split the column into 'two' with separate, recode the numeric values, unite and then do a group by sum

library(dplyr)
library(tidyr)
df %>% 
  separate(grp, into = c('grp1', 'value'), sep = "(?<=ab)\\s+",
   fill = "right", convert = TRUE) %>% 
  mutate(value = case_when(value <0 ~ '< 0', 
     between(value, 0, 5) ~ '0-5', value > 5 ~ '5+')) %>%
  unite(grp, grp1, value, na.rm = TRUE, sep=" ") %>% 
  group_by(grp) %>%
  summarise(freq = sum(freq), .groups = 'drop')

-output

# A tibble: 5 × 2
  grp     freq
  <chr>  <dbl>
1 ab < 0     7
2 ab 0-5     1
3 ab 5+      5
4 d          0
5 e          5

In the OP's code, it is the beginning df %>% needs to removed as we are passing both filtered datasets in bind_rows. When we add the df %>%, it will be passed as the first argument to bind_rows, thus duplicating the rows

library(readr)
bind_rows(df %>%
              filter(!grepl("ab", grp)),
            df %>%
              filter(grepl("ab", grp)) %>%
              mutate(grp = parse_number(grp)) %>%
              mutate(grp = cut(as.numeric(grp),
                                          breaks = c(-999, 0, 6, 999),
                                          labels = c("ab < 0", "ab 0-5", "ab 5+"),
                                          right = FALSE))) %>% 
    group_by(grp) %>%
    summarise(N =sum(freq))
# A tibble: 5 × 2
  grp        N
  <chr>  <dbl>
1 ab < 0     7
2 ab 0-5     1
3 ab 5+      5
4 d          0
5 e          5