is there a R function (or sequence of steps) to grouping and summarise (count) a dataframe like this (with some repeated values in the rows)

I have a df like that

df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                 entity = c('e1','e2','e3','e3','e4'),
                 area = c('a1','a1','a2','a2','a1'),
                 sex=c('M','M','M','F','F'))

and i need to obtain a df like that

df2<- data.frame (area = c('a1', 'a2'),
                  male = c(1,1),
                  female = c(1,1),
                  total=c(2,2))

Count the number of women and men by area

Solution

df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                 entity = c('e1','e2','e3','e3','e4'),
                 area = c('a1','a1','a2','a2','a1'),
                 sex=c('M','M','M','F','F'))

library(tidyverse)
df %>%
  distinct(user, area, sex) %>%
  mutate(sex = ifelse(sex == "M", "male", "female")) %>% 
  pivot_wider(
    id_cols = area,
    names_from = sex,
    values_from = sex,
    values_fill = 0,
    values_fn = length
  ) %>% 
  mutate(Total = rowSums(across(male:female)))
#> # A tibble: 2 x 4
#>   area   male female Total
#>   <chr> <int>  <int> <dbl>
#> 1 a1        1      1     2
#> 2 a2        1      1     2

^{Created on 2022-01-25 by the reprex package (v2.0.1)}