Filter variables with conditions from a dataset

Here is the dataset

data <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv")

Step 1: Find the company manufacturer that has more than 10 ratings. So I need to count how many ratings each company manufacturer have and filter to just take those who have 10 or more amount of ratings.

data %>% 
  group_by(company_manufacturer) %>% 
  summarise(count(rating, na.rm=TRUE) >= 10)

Step 2: Mutate another two column which consists of their the mean rating and standard deviation of each company_manufacturer.

Solution

Something like this?

library(dplyr)

data %>% 
  group_by(company_manufacturer) %>% 
  summarise(average_rating = mean(rating, na.rm = TRUE),
            sd_rating = sd(rating, na.rm = TRUE),
            n = n()) %>% 
  filter(n >= 10)

company_manufacturer         average_rating sd_rating     n
   <chr>                                 <dbl>     <dbl> <int>
 1 A. Morin                               3.42     0.417    26
 2 Altus aka Cao Artisan                  2.86     0.282    11
 3 Amedei                                 3.31     0.356    13
 4 Arete                                  3.53     0.322    32
 5 Artisan du Chocolat                    3.08     0.663    16
 6 Bittersweet Origins                    3.27     0.317    14
 7 Bonnat                                 3.47     0.560    30
 8 Brasstown aka It's Chocolate           3.55     0.292    11
 9 Cacao de Origen                        3.12     0.429    10
10 Castronovo                             3.38     0.436    19
# ... with 43 more rows