Search code examples
rtidyr

Make wide dataframe long with different variable names


I have a dataframe that looks like this:

example <- data.frame(
  date = as.Date(c('2001-01-01',
                   '2001-01-02',
                   '2001-01-01',
                   '2001-01-02')),
  PID_A = c(1091, 1091, 1037, 1037),
  PID_B = c(2091, 2091, 2037, 2037),
  resp_A = c(3,1,2,4),
  resp_B = c(2,4,3,1),
  xxx_connect_A = c(6,2,5,3),
  xxx_connect_B = c(5,3,6,2)
)

I want to collapse across _As and _Bs to form one column for each variable.

Here is the desired output:

example_solution <- data.frame(
  date = as.Date(rep(c('2001-01-01',
                       '2001-01-02'),4)),
  PID = c(1091, 1091, 2091, 2091, 1037, 1037, 2037, 2037),
  resp = c(3,1,2,4,2,4,3,1),
  xxx_connect = c(6,2,5,3,5,3,6,2)
)

I tried this code from a previous stackoverflow answer but it doesn't work as well now because my variables have differing number of underscores.

example |> 
  pivot_longer(-date, names_to = c(".value", "who"), names_sep = "_")

Does anyone know an efficient way to fix this?

Thank you!


Solution

  • Instead of names_sep, you can use names_pattern with a regular expression to split the data

    example |> 
      pivot_longer(-date, names_to = c(".value", "who"), names_pattern = "(.+)_([^_]+)$")
    #   date       who     PID  resp xxx_connect
    #   <date>     <chr> <dbl> <dbl>       <dbl>
    # 1 2001-01-01 A      1091     3           6
    # 2 2001-01-01 B      2091     2           5
    # 3 2001-01-02 A      1091     1           2
    # 4 2001-01-02 B      2091     4           3
    # 5 2001-01-01 A      1037     2           5
    # 6 2001-01-01 B      2037     3           6
    # 7 2001-01-02 A      1037     4           3
    # 8 2001-01-02 B      2037     1           2
    

    The regular expression looks for the last "_" that's at the end of the string and splits there.