Search code examples
rdplyrpurrr

Summarising with purr or map the percentage of the total of a factor variable


I'm trying to do a map function for each of the variables of a list. To do this function, I want to group_by by some categories and show the percentage of the total of each factor of a variable

For instance, I have this list:

mtcars_list <- c ("am","gear","carb")

I have the variable I want to group_by for: "cyl" And the variable I want to summarise. In this case I will transform the variable "vs" of the mtcars database as a factor:

mtcars$vs <- factor(mtcars$vs , levels=c('0', '1'))

I then do this map:: purr function, which gives me error when I use count, prop.table or similar when summarising...

purrr::map(mtcars_list, ~ mtcars %>%
  group_by(cyl, .data[[.x]]) %>%
  summarise(count(vs), .groups = "drop")*100)

When I run this it says:

no applicable method for 'count' applied to an object of class "c('double', 'numeric')

The result would be something like this

First category

   0        1
A 17.7%     83.3%
B  5.0%     95.5%
Second category 
    0        1 
A   2.0     98.0
B   4.0     96.0

Thank you!!!


Solution

  • Please check if this is the expected output

    df1 <- purrr::map(mtcars_list, ~ mtcars %>% select(cyl,vs,!!sym(.x)) %>% 
                        mutate(n=n() , .by=c(cyl, .data[[.x]], vs)) %>% 
                        mutate(n2=n(), .by=c(cyl, .data[[.x]]) ) %>% 
                        group_by(cyl, .data[[.x]], vs) %>% 
                        slice_tail(n=1) %>% 
                        mutate(perc=(n/n2)*100) %>% 
                        pivot_wider(id_cols = c(cyl,.data[[.x]]), names_from = vs, values_from = perc)
    )
    
    df1
    
    
    
    [[1]]
    # A tibble: 6 × 4
    # Groups:   cyl, am [6]
        cyl    am   `1`   `0`
      <dbl> <dbl> <dbl> <dbl>
    1     4     0 100    NA  
    2     4     1  87.5  12.5
    3     6     0 100    NA  
    4     6     1  NA   100  
    5     8     0  NA   100  
    6     8     1  NA   100  
    
    [[2]]
    # A tibble: 8 × 4
    # Groups:   cyl, gear [8]
        cyl  gear   `1`   `0`
      <dbl> <dbl> <dbl> <dbl>
    1     4     3   100    NA
    2     4     4   100    NA
    3     4     5    50    50
    4     6     3   100    NA
    5     6     4    50    50
    6     6     5    NA   100
    7     8     3    NA   100
    8     8     5    NA   100
    
    [[3]]
    # A tibble: 9 × 4
    # Groups:   cyl, carb [9]
        cyl  carb   `1`   `0`
      <dbl> <dbl> <dbl> <dbl>
    1     4     1 100    NA  
    2     4     2  83.3  16.7
    3     6     1 100    NA  
    4     6     4  50    50  
    5     6     6  NA   100  
    6     8     2  NA   100  
    7     8     3  NA   100  
    8     8     4  NA   100  
    9     8     8  NA   100