Search code examples
rcountgroupinguniquesummarize

is there a R function (or sequence of steps) to grouping and summarise (count) a dataframe like this (with some repeated values in the rows)


I have a df like that

df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                 entity = c('e1','e2','e3','e3','e4'),
                 area = c('a1','a1','a2','a2','a1'),
                 sex=c('M','M','M','F','F'))

and i need to obtain a df like that

df2<- data.frame (area = c('a1', 'a2'),
                  male = c(1,1),
                  female = c(1,1),
                  total=c(2,2))

Count the number of women and men by area


Solution

  • df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                     entity = c('e1','e2','e3','e3','e4'),
                     area = c('a1','a1','a2','a2','a1'),
                     sex=c('M','M','M','F','F'))
    
    library(tidyverse)
    df %>%
      distinct(user, area, sex) %>%
      mutate(sex = ifelse(sex == "M", "male", "female")) %>% 
      pivot_wider(
        id_cols = area,
        names_from = sex,
        values_from = sex,
        values_fill = 0,
        values_fn = length
      ) %>% 
      mutate(Total = rowSums(across(male:female)))
    #> # A tibble: 2 x 4
    #>   area   male female Total
    #>   <chr> <int>  <int> <dbl>
    #> 1 a1        1      1     2
    #> 2 a2        1      1     2
    

    Created on 2022-01-25 by the reprex package (v2.0.1)