Search code examples
rgroup-by

Create new variable that counts number of repeated values


My dataset is such:

structure(list(NUMERO = structure(c(1, 2, 3, 3, 4, 5, 6, 6, 6, 
    6), format.stata = "%12.0g"), sexe = structure(c(1L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L), levels = c("Dona", "Home"), class = "factor"), 
        edat = c(71, 73, 44, 44, 70, 69, 56, 56, 23, 19)), row.names = c(NA, 
    -10L), class = c("tbl_df", "tbl", "data.frame"))

numero is my id variable, and I want to create a new variable that counts how many values in numero are repeated and assign the sum to each observation. So, if there are 4 observations whith numero = 6, then for this observations membres should be 4.

In other words, this is the output I'm looking for:

enter image description here


Solution

  • Either do add_count

    library(dplyr)
    df1 %>%
      add_count(NUMERO, name = "membres")
    

    or use

    library(dplyr) # version >= 1.1.0
    df1 %>% 
       mutate(membres = n(), .by = NUMERO)
    

    -output

    # A tibble: 10 × 4
       NUMERO sexe   edat membres
        <dbl> <fct> <dbl>   <int>
     1      1 Dona     71       1
     2      2 Dona     73       1
     3      3 Home     44       2
     4      3 Dona     44       2
     5      4 Home     70       1
     6      5 Dona     69       1
     7      6 Home     56       4
     8      6 Dona     56       4
     9      6 Home     23       4
    10      6 Dona     19       4