Search code examples
rpivotreshapetidyrdata-manipulation

pivot multiple sets of columns at once


I have a data frame where I have regressed a bunch of datasets, then taken subsets of those datasets and regressed again. This has resulted in a dataframe with columns showing the slope and intercept and standard error of the "full" dataset, then more columns showing those things for the "subset" dataset.

I want to transform the dataset to long format, where a column shows which type it is (full or subset), then other columns show the slope, se, intercept, etc.

I have figured out a way of doing this by doing pivot_longer many different times then filtering to where the new columns created by the pivot match up, but that can not be the best way to do this. I'm wondering if there's a way to list sets of columns in the pivot function to skip this big chunk of code. reprex below.

# made dataframe
df <- 
  tribble(
    ~trial,   ~full_slope,    ~full_slope_se,    ~subset_slope,  ~subset_slope_se,
     1,            10,              1,                12,               2.5,
     2,             9,               1.2,             8.5,               3,
     3,             9.5,              2,               9.9,              3
  )


# pivot 

df %>%
  # first pivot the slope columns
  pivot_longer(cols = c(full_slope, subset_slope),
               names_to = "type",
               values_to = "slope") %>%

  # next pivot the SE columns
  pivot_longer(cols = c(full_slope_se, subset_slope_se),
               names_to = "type_se",
               values_to = "se") %>%

 # add a column for when they match up (slope and se both from same dataset, full or subset)
  mutate(
    data_type = 
      case_when(
        type == "full_slope" & type_se == "full_slope_se" ~ "full",
        type == "subset_slope" & type_se == "subset_slope_se" ~ "subset"
        )) %>%

  # remove rows that are musmatched
  filter(!is.na(data_type)) %>%

  # remove extra columns made by pivots
  select(-type, -type_se) %>%
  relocate(data_type, .after = trial)

This DOES give me my desired output, I just feel like this can not be the way I am supposed to do this. Thanks in advance!


Solution

  • Use names_pattern=:

    df %>%
      pivot_longer(-trial, names_pattern = "([^_]*)_(.*)", names_to = c("data_table", ".value"))
    # # A tibble: 6 x 4
    #   trial data_table slope slope_se
    #   <dbl> <chr>      <dbl>    <dbl>
    # 1     1 full        10        1  
    # 2     1 subset      12        2.5
    # 3     2 full         9        1.2
    # 4     2 subset       8.5      3  
    # 5     3 full         9.5      2  
    # 6     3 subset       9.9      3