Search code examples
rdplyrpivot

Trying to put the data from wide to long with multiple variables that are each different R


I am trying to use pivot_longer to move my data from wide to long in R. However, I have multiple different variable that will need to be created. I want to create a column called "Assessment" that has "pre" or "post" within it. Then I want a column that says "Test" with "A or B" in it, then I want a "Error" column with the values for only the columns that currently say "error" and then a "Time" column for the values for only the columns that currently say "time" in them. Right now the data looks like this:

data.frame(
  stringsAsFactors = FALSE,
      Testing.Date = c("5/25/23","5/30/23","5/31/23",
                       "6/1/23","6/27/23","6/20/23"),
                ID = c(101L, 103L, 106L, 109L, 110L, 201L),
         Condition = c("Lab", "Nature", "Nature", "Lab", "Nature", "Lab"),
               Age = c(21L, 22L, 22L, 21L, 29L, 56L),
        Pre.Time.A = c(22.78, 15.75, 30.33, 18.43, 26.99, 24.94),
       Pre.Error.A = c(0L, 0L, 1L, 0L, 0L, 0L),
        Pre.Time.B = c(60.38, 61.65, 65.6, 65.73, 58.63, 60.97),
     Pre.Error.B = c(0L, 0L, 1L, 2L, 0L, 0L),
       Post.Time.A = c(21.48, 14.65, 20.66, 24.55, 23.87, 35.7),
      Post.Error.A = c(0L, 0L, 0L, 0L, 0L, 0L),
       Post.Time.B = c(45.88, 48.58, 56.45, 48.95, 53, 64.3),
      Post.Error.B = c(0L, 0L, 0L, 0L, 0L, 0L)
)

I want it to look like this:

TMT_long <- tibble::tribble(
              ~Testing.Date,  ~ID, ~Condition, ~Age, ~Assessment, ~Test, ~Time, ~Errors,
                  "5/25/23", 101L,      "Lab",  21L,       "Pre",   "A", 22.78,      0L,
                  "5/25/23", 101L,      "Lab",  21L,       "Pre",   "B", 60.38,      0L,
                  "5/25/23", 101L,      "Lab",  21L,      "Post",   "A", 21.48,      0L,
                  "5/25/23", 101L,      "Lab",  21L,      "Post",   "B", 45.88,      0L
              )

I can't seem to figure out the right syntax in pivot_longer


Solution

  • Here's a one-liner:

    tidyr::pivot_longer(df, matches("P"), names_pattern = "(.*)\\.(.*)\\.(.*)", names_to = c("Assessment", ".value", "Test"), values_drop_na = TRUE)
    

    Output:

    # A tibble: 24 × 8
       Testing.Date    ID Condition   Age Assessment Test   Time Error
       <chr>        <int> <chr>     <int> <chr>      <chr> <dbl> <int>
     1 5/25/23        101 Lab          21 Pre        A      22.8     0
     2 5/25/23        101 Lab          21 Pre        B      60.4     0
     3 5/25/23        101 Lab          21 Post       A      21.5     0
     4 5/25/23        101 Lab          21 Post       B      45.9     0
     5 5/30/23        103 Nature       22 Pre        A      15.8     0
     6 5/30/23        103 Nature       22 Pre        B      61.6     0
     7 5/30/23        103 Nature       22 Post       A      14.6     0
     8 5/30/23        103 Nature       22 Post       B      48.6     0
     9 5/31/23        106 Nature       22 Pre        A      30.3     1
    10 5/31/23        106 Nature       22 Pre        B      65.6     1
    # ℹ 14 more rows
    

    Notes:

    • thanks to Josh for the matches() idea, I hadn't seen that before
    • names_pattern essentially matches the column names as a regex, turning the prefix and suffix values into values in the Assessment and Test columns respectively, and the central part into the values of the time and error columns