Search code examples
rtidyversepivot-table

Pivot a data frame to show metrics on rows, and years in column in R for each group


I have the following data frame in R

df <- data.frame(
  year = c(2018, 2018, 2019, 2019), 
  group = c("A", "A", "B", "B"), 
  metric = c(100, 150, 150, 200), 
  metric_percent = c(0.1, 0.2,0.3, 0.4))
year group metric metric_percent
2018 A 100 0.1
2018 A 150 0.2
2019 B 150 0.3
2019 B 200 0.4

When I ran the following code, the output creates two columns for headcount & year, and two columns for representation and year.

df |> 
  pivot_wider(names_from = year,
              values_from = c(headcount, representation))

# Output
# A tibble: 2 × 5
  group headcount_2018 headcount_2019 representation_2018 representation_2019
  <chr> <list>         <list>         <list>              <list>             
1 A     <dbl [2]>      <NULL>         <dbl [2]>           <NULL>             
2 B     <NULL>         <dbl [2]>      <NULL>              <dbl [2]>   

In my real example I have several groups and metrics to display so I need to see the data in a table in order to see the evolution of each metric for each group.

This is how I'd like to display the data in a table (I'll use flextable or gt for formatting it)

df2 <- data.frame(
  group = c("A", "A", "B", "B"),
  metric = c("headcount", "representation", "headcount", "representation"),
  "2018" = c(100, .1, 150, 0.3),
  "2019" = c(150, 0.2, 200, 0.4))

df2
group metric 2018 2019
A headcount 100.0 150.0
A representation 0.1 0.2
B headcount 150.0 200.0
B representation 0.3 0.4

Thanks in advance!


Solution

  • As @stefan said in the comments your data as it is posted cannot be reshaped into the desired format. Only if this is your data

    dat
    # A tibble: 4 × 4
      year  group headcount representation
      <chr> <chr>     <dbl>          <dbl>
    1 2018  A           100            0.1
    2 2019  A           150            0.2
    3 2018  B           150            0.3
    4 2019  B           200            0.4
    

    it can be reshaped into the expected format

    library(dplyr)
    library(tidyr)
    
    dat %>% 
      pivot_longer(-c(year, group), names_to="metric") %>% 
      pivot_wider(names_from=year, values_from=value)
    # A tibble: 4 × 4
      group metric         `2018` `2019`
      <chr> <chr>           <dbl>  <dbl>
    1 A     headcount       100    150  
    2 A     representation    0.1    0.2
    3 B     headcount       150    200  
    4 B     representation    0.3    0.4
    

    Data

    dat <- structure(list(year = c("2018", "2019", "2018", "2019"), group = c("A", 
    "A", "B", "B"), headcount = c(100, 150, 150, 200), representation = c(0.1, 
    0.2, 0.3, 0.4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
    -4L))