Search code examples
rpivottidyversetidyr

Transform data to long with grouped columns


For this week's tidytuesday challenge, for some reason, I am not able to group the column names in R which I was doing with pivot_longer function from tidyr previously. So, here is my code and I do not get it why it does throw an error and not give what I want.

library(tidyverse)

tuesdata <- tidytuesdayR::tt_load(2023, week = 7)
age_gaps <- tuesdata$age_gaps

df_long <- age_gaps %>%
  pivot_longer(cols= actor_1_name:actor_2_name, names_to = "actornumber", values_to = "actorname") %>%
  pivot_longer(cols= character_1_gender:character_2_gender, names_to = "gendernumber", values_to = "gender") %>%
  pivot_longer(cols= actor_1_age:actor_2_age, names_to = "agenumber", values_to = "age") %>%
  select(movie_name, release_year, director, age_difference, actorname, gender, age) 

As seen from the code, the initial data has 1155 rows and after doing the quick data wrangling, I am expecting to get a data of 1155x2=2310 rows as I would like to merge the columns on actor names and their relevant information such as age and birthdate. Yet, the code does not give me the expected outcome and I am wondering why and how can I solve this problem. Thank you for your attention beforehand.

Example data (first 6 rows)
age_gaps <- structure(list(movie_name = c("Harold and Maude", "Venus", "The Quiet American", 
"The Big Lebowski", "Beginners", "Poison Ivy"), release_year = c(1971, 
2006, 2002, 1998, 2010, 1992), director = c("Hal Ashby", "Roger Michell", 
"Phillip Noyce", "Joel Coen", "Mike Mills", "Katt Shea"), age_difference = c(52, 
50, 49, 45, 43, 42), couple_number = c(1, 1, 1, 1, 1, 1), actor_1_name = c("Ruth Gordon", 
"Peter O'Toole", "Michael Caine", "David Huddleston", "Christopher Plummer", 
"Tom Skerritt"), actor_2_name = c("Bud Cort", "Jodie Whittaker", 
"Do Thi Hai Yen", "Tara Reid", "Goran Visnjic", "Drew Barrymore"
), character_1_gender = c("woman", "man", "man", "man", "man", 
"man"), character_2_gender = c("man", "woman", "woman", "woman", 
"man", "woman"), actor_1_birthdate = structure(c(-26725, -13666, 
-13442, -14351, -14629, -13278), class = "Date"), actor_2_birthdate = structure(c(-7948, 
4536, 4656, 2137, 982, 1878), class = "Date"), actor_1_age = c(75, 
74, 69, 68, 81, 59), actor_2_age = c(23, 24, 20, 23, 38, 17)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • You could set ".value" in names_to and supply one of names_sep or names_pattern to specify how the column names should be split.

    library(tidyr)
    
    age_gaps %>%
      pivot_longer(actor_1_name:actor_2_age,
                   names_prefix = "(actor|character)_", 
                   names_to = c("actor", ".value"),
                   names_sep = '_')
    
    # A tibble: 12 × 10
       movie_name         release_year director      age_difference couple_number actor name                gender birthdate    age
       <chr>                     <dbl> <chr>                  <dbl>         <dbl> <chr> <chr>               <chr>  <date>     <dbl>
     1 Harold and Maude           1971 Hal Ashby                 52             1 1     Ruth Gordon         woman  1896-10-30    75
     2 Harold and Maude           1971 Hal Ashby                 52             1 2     Bud Cort            man    1948-03-29    23
     3 Venus                      2006 Roger Michell             50             1 1     Peter O'Toole       man    1932-08-02    74
     4 Venus                      2006 Roger Michell             50             1 2     Jodie Whittaker     woman  1982-06-03    24
     5 The Quiet American         2002 Phillip Noyce             49             1 1     Michael Caine       man    1933-03-14    69
     6 The Quiet American         2002 Phillip Noyce             49             1 2     Do Thi Hai Yen      woman  1982-10-01    20
     7 The Big Lebowski           1998 Joel Coen                 45             1 1     David Huddleston    man    1930-09-17    68
     8 The Big Lebowski           1998 Joel Coen                 45             1 2     Tara Reid           woman  1975-11-08    23
     9 Beginners                  2010 Mike Mills                43             1 1     Christopher Plummer man    1929-12-13    81
    10 Beginners                  2010 Mike Mills                43             1 2     Goran Visnjic       man    1972-09-09    38
    11 Poison Ivy                 1992 Katt Shea                 42             1 1     Tom Skerritt        man    1933-08-25    59
    12 Poison Ivy                 1992 Katt Shea                 42             1 2     Drew Barrymore      woman  1975-02-22    17