Search code examples
rdataframepivottidyrwrangle

How can I pivot wider and transform my data frame?


I have a data frame like this:

tibble(
  School = c(1, 1, 2, 3, 3, 4),
  City = c("A","A", "B", "C", "C", "B"),
  Grade = c("7th", "7th", "7th", "6th", "8th", "8th"),
  Number_Students = c(20, 23, 25, 21, 28, 34),
  Type_school = c("public", "public", "private", "public", "public", "private")
  )
ID School City Grade Number_Students Type_school
1 1 A 7th 20 public
2 1 A 7th 23 public
3 2 B 7th 25 private
4 3 C 6th 21 public
5 3 C 8th 28 public
6 4 B 8th 34 private

The unit of analysis is the classrooms, but I would like to turn it into a data frame where the unit of analysis is the school, but with some computations. Like this:

tibble(
  School = c(1, 2, 3, 4),
  City = c("A", "B", "C", "B"),
  N_6th = c(0, 0, 1, 0), # here is the number of grade 6h classrooms in each school 
  N_7th = c(2,1,0,0),
  N_8th = c(0,0,1,1),
  Students_6th = c(0, 0, 25, 0), # here is the number of students in grade 6th from each school (the sum of all 7th grade classrooms from each school)
  Students_7th = c(43, 25, 0, 0), 
  Students_8th = c(0, 0, 28, 34),
  Type_school = c("public", "private", "public", "private")
)
School City N_6th N_7th N_8th Students_6th Students_7th Students_8th Type_school
1 A 0 2 0 0 43 0 public
2 B 0 1 0 0 25 0 private
3 C 1 0 1 25 0 28 public
4 B 0 0 1 0 0 34 private

I'm trying the pivot_wider(), but that's not enough for my needs. I need to sum the number of classrooms of the same grade in each school and the number of students in the same grade from each school.


Solution

  • Do a group by and return the count, and the sum of 'Number_Students' and then use pivot_wider with names_from specified as the 'Grade' and the values_from as a vector of columns

    library(dplyr)
    library(tidyr)
    df1 %>%
      group_by(School, City, Grade, Type_school) %>%
      summarise(N = n(), Students = sum(Number_Students), .groups = 'drop') %>%
      pivot_wider(names_from = Grade, values_from = c(N, Students), values_fill = 0)
    

    -output

    # A tibble: 4 × 9
      School City  Type_school N_7th N_6th N_8th Students_7th Students_6th Students_8th
       <dbl> <chr> <chr>       <int> <int> <int>        <dbl>        <dbl>        <dbl>
    1      1 A     public          2     0     0           43            0            0
    2      2 B     private         1     0     0           25            0            0
    3      3 C     public          0     1     1            0           21           28
    4      4 B     private         0     0     1            0            0           34