sum() condition in ifelse statement

This question is related to this question My question is about R: How to number each repetition in a table in R?

Where basically the repetitions are numbered. E.g two repetitions: 1,2 ; three repetitions: 1,2,3 etc... But if the value is unique (only one time) it should have not 1 but NA

data: (from akrun, many thanks!)

df1 <- structure(list(Fullname = c("Peter", "Peter", "Alison", "Warren", 
                                   "Jack", "Jack", "Jack", "Jack", "Susan", "Susan", "Henry", "Walison", 
                                   "Tinder", "Peter", "Henry", "Tinder")), row.names = c(NA, -16L
                                   ), class = "data.frame")

my solution would be this:

df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = seq_along(Fullname)) 

  Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         2
 3 Alison        1
 4 Warren        1
 5 Jack          1
 6 Jack          2
 7 Jack          3
 8 Jack          4
 9 Susan         1
10 Susan         2
11 Henry         1
12 Walison       1
13 Tinder        1
14 Peter         3
15 Henry         2
16 Tinder        2

Now I try to set each value that occurs once (e.g. Alison, Warren and Henry) to NAlike akrun did here My question is about R: How to number each repetition in a table in R?

My code is with a ifelse statement checking if the sum of the group is >1.

df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = seq_along(Fullname)) %>% 
  mutate(newcol = ifelse(sum(newcol)>1, newcol, NA))

but I get:

 Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         1
 3 Alison       NA
 4 Warren       NA
 5 Jack          1
 6 Jack          1
 7 Jack          1
 8 Jack          1
 9 Susan         1
10 Susan         1
11 Henry         1
12 Walison      NA
13 Tinder        1
14 Peter         1
15 Henry         1
16 Tinder        1

And I can't grasp why?


  • We need if/else here instead of ifelse as ifelse requires all arguments to be same length, sum returns a single value and if it is TRUE, then all becomes TRUE

    df1 %>% 
      group_by(Fullname) %>% 
      mutate(newcol = row_number(), 
           newcol = if(sum(newcol)> 1) newcol else NA) %>%


    # A tibble: 16 × 2
       Fullname newcol
       <chr>     <int>
     1 Peter         1
     2 Peter         2
     3 Alison       NA
     4 Warren       NA
     5 Jack          1
     6 Jack          2
     7 Jack          3
     8 Jack          4
     9 Susan         1
    10 Susan         2
    11 Henry         1
    12 Walison      NA
    13 Tinder        1
    14 Peter         3
    15 Henry         2
    16 Tinder        2

    Now, we look at the issue. The 'newcol2' values are recycled values of single TRUE/FALSE. In the ifelse, as all arguments need to be same length, the logical part is just of length 1.

    df1 %>% 
       group_by(Fullname) %>% 
       mutate(newcol = row_number(), newcol2 = sum(newcol) > 1)
    # A tibble: 16 × 3
    # Groups:   Fullname [8]
       Fullname newcol newcol2
       <chr>     <int> <lgl>  
     1 Peter         1 TRUE   
     2 Peter         2 TRUE   
     3 Alison        1 FALSE  
     4 Warren        1 FALSE  
     5 Jack          1 TRUE   
     6 Jack          2 TRUE   
     7 Jack          3 TRUE   
     8 Jack          4 TRUE   
     9 Susan         1 TRUE   
    10 Susan         2 TRUE   
    11 Henry         1 TRUE   
    12 Walison       1 FALSE  
    13 Tinder        1 TRUE   
    14 Peter         3 TRUE   
    15 Henry         2 TRUE   
    16 Tinder        2 TRUE  

    The way to tackle is replicate to make the lengths same

    df1 %>% 
      group_by(Fullname) %>% 
      mutate(newcol = seq_along(Fullname)) %>% 
      mutate(newcol = ifelse(rep(sum(newcol)>1, n()), newcol, NA))
    # A tibble: 16 × 2
    # Groups:   Fullname [8]
       Fullname newcol
       <chr>     <int>
     1 Peter         1
     2 Peter         2
     3 Alison       NA
     4 Warren       NA
     5 Jack          1
     6 Jack          2
     7 Jack          3
     8 Jack          4
     9 Susan         1
    10 Susan         2
    11 Henry         1
    12 Walison      NA
    13 Tinder        1
    14 Peter         3
    15 Henry         2
    16 Tinder        2

    In order to understand it better, just take a simple vector

    > v1 <- c(1:5)
    > sum(v1) > 4
    [1] TRUE
    > ifelse(sum(v1) > 4, v1, NA)
    [1] 1

    The sum here is 15 and it is definitely greater than 4. As soon as the TRUE is found, it just returns the first element of the vector i.e. 1 and stops. In the %>% also, this is what is happening, but because there is recycling, the 1 gets repeated to fill the whole group