Here is the dataset
data <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv")
Step 1: Find the company manufacturer that has more than 10 ratings. So I need to count how many ratings each company manufacturer have and filter to just take those who have 10 or more amount of ratings.
data %>%
group_by(company_manufacturer) %>%
summarise(count(rating, na.rm=TRUE) >= 10)
Step 2: Mutate another two column which consists of their the mean rating and standard deviation of each company_manufacturer.
Something like this?
library(dplyr)
data %>%
group_by(company_manufacturer) %>%
summarise(average_rating = mean(rating, na.rm = TRUE),
sd_rating = sd(rating, na.rm = TRUE),
n = n()) %>%
filter(n >= 10)
company_manufacturer average_rating sd_rating n
<chr> <dbl> <dbl> <int>
1 A. Morin 3.42 0.417 26
2 Altus aka Cao Artisan 2.86 0.282 11
3 Amedei 3.31 0.356 13
4 Arete 3.53 0.322 32
5 Artisan du Chocolat 3.08 0.663 16
6 Bittersweet Origins 3.27 0.317 14
7 Bonnat 3.47 0.560 30
8 Brasstown aka It's Chocolate 3.55 0.292 11
9 Cacao de Origen 3.12 0.429 10
10 Castronovo 3.38 0.436 19
# ... with 43 more rows