So, if I had a data table like this:
stores <- read_csv("stores.csv")
stores
# A tibble: 6 x 3
state store num_locations
<chr> <chr> <dbl>
1 california target 20
2 california walmart 29
3 nevada target 10
4 nevada walmart 12
5 arizona target 15
6 arizona walmart 19
Then, I create a new data frame without the location information:
stores_2 <- select(stores, store, num_locations)
# A tibble: 6 x 2
store num_locations
<chr> <dbl>
1 target 20
2 walmart 29
3 target 10
4 walmart 12
5 target 15
6 walmart 19
Is there a way I can create a third data set that provides an average number of locations, like this (I'm not sure how to actually generate this tibble):
# A tibble: 6 x 2
store avg_num_locations
<chr> <dbl>
1 target 15
2 walmart 20
One solution is to use tidyverse functions group_by()
and summarise()
:
library(tidyverse)
stores <- data.frame(
stringsAsFactors = FALSE,
state = c("california","california","nevada","nevada","arizona",
"arizona"),
store = c("target",
"walmart","target","walmart","target",
"walmart"),
num_locations = c(20L, 29L, 10L, 12L, 15L, 19L)
)
stores_summary <- stores %>%
group_by(store) %>%
summarise(avg_num_locations = mean(num_locations))
stores_summary
# A tibble: 2 x 2
# store avg_num_locations
# <chr> <dbl>
#1 target 15
#2 walmart 20