Search code examples
rdplyrreadr

How to create new rows based on data from a different table (R)


So, if I had a data table like this:

stores <- read_csv("stores.csv")
stores

# A tibble: 6 x 3
  state      store   num_locations
  <chr>      <chr>           <dbl>
1 california target             20
2 california walmart            29
3 nevada     target             10
4 nevada     walmart            12
5 arizona    target             15
6 arizona    walmart            19

Then, I create a new data frame without the location information:

stores_2 <- select(stores, store, num_locations)

# A tibble: 6 x 2
  store   num_locations
  <chr>           <dbl>
1 target             20
2 walmart            29
3 target             10
4 walmart            12
5 target             15
6 walmart            19

Is there a way I can create a third data set that provides an average number of locations, like this (I'm not sure how to actually generate this tibble):

# A tibble: 6 x 2
  store   avg_num_locations
  <chr>           <dbl>
1 target             15
2 walmart            20

Solution

  • One solution is to use tidyverse functions group_by() and summarise():

    library(tidyverse)
    
    stores <- data.frame(
      stringsAsFactors = FALSE,
                           state = c("california","california","nevada","nevada","arizona",
                                     "arizona"),
                           store = c("target",
                                     "walmart","target","walmart","target",
                                     "walmart"),
         num_locations = c(20L, 29L, 10L, 12L, 15L, 19L)
              )
    
    stores_summary <- stores %>%
      group_by(store) %>%
      summarise(avg_num_locations = mean(num_locations))
    
    stores_summary
    # A tibble: 2 x 2
    #  store   avg_num_locations
    #  <chr>               <dbl>
    #1 target                 15
    #2 walmart                20