Search code examples
rdata-visualizationdata-wrangling

Filter variables with conditions from a dataset


Here is the dataset

data <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv")

Step 1: Find the company manufacturer that has more than 10 ratings. So I need to count how many ratings each company manufacturer have and filter to just take those who have 10 or more amount of ratings.

data %>% 
  group_by(company_manufacturer) %>% 
  summarise(count(rating, na.rm=TRUE) >= 10)

Step 2: Mutate another two column which consists of their the mean rating and standard deviation of each company_manufacturer.


Solution

  • Something like this?

    library(dplyr)
    
    data %>% 
      group_by(company_manufacturer) %>% 
      summarise(average_rating = mean(rating, na.rm = TRUE),
                sd_rating = sd(rating, na.rm = TRUE),
                n = n()) %>% 
      filter(n >= 10) 
    
    company_manufacturer         average_rating sd_rating     n
       <chr>                                 <dbl>     <dbl> <int>
     1 A. Morin                               3.42     0.417    26
     2 Altus aka Cao Artisan                  2.86     0.282    11
     3 Amedei                                 3.31     0.356    13
     4 Arete                                  3.53     0.322    32
     5 Artisan du Chocolat                    3.08     0.663    16
     6 Bittersweet Origins                    3.27     0.317    14
     7 Bonnat                                 3.47     0.560    30
     8 Brasstown aka It's Chocolate           3.55     0.292    11
     9 Cacao de Origen                        3.12     0.429    10
    10 Castronovo                             3.38     0.436    19
    # ... with 43 more rows