Search code examples
rdplyrpivot

How to use pivot_longer to combine multiple columns in R with dplyr


I'm working with a data set that looks something like this. I'm trying to use pivot_longer and dplyr to get the means and sd under their own respective columns so I can do some plotting but I can't seem to get it to work properly. Could someone give me some tips?

day mean_depth depth_sd mean_temp temp_sd
2021-01-01 20.5 5.2 15.8 3.8
2021-01-02 15.5 3.9 15.6 2.2

I hope to end up with a table that looks like this:

day variable mean sd
2021-01-01 depth 20.5 5.2
2021-01-01 temp 15.8 3.8
2021-01-02 depth 15.5 3.9
2021-01-02 temp 15.6 2.2

Solution

  • If you clean up the column names so the variable is awlays first or always last, this is easier. For example

    library(dplyr)
    library(tidyr)
    dd %>% 
      rename_with(.cols=starts_with("mean"), function(x) gsub("([^_]*)_(.*)", "\\2_\\1", x)) %>% 
      pivot_longer(-day, names_pattern="(.*)_(.*)", names_to=c("variable", ".value"))
    

    This returns

      day        variable  mean    sd
      <chr>      <chr>    <dbl> <dbl>
    1 2021-01-01 depth     20.5   5.2
    2 2021-01-01 temp      15.8   3.8
    3 2021-01-02 depth     15.5   3.9
    4 2021-01-02 temp      15.6   2.2
    

    Tested with

    dd <- read.table(text="day  mean_depth  depth_sd    mean_temp   temp_sd
    2021-01-01  20.5    5.2 15.8    3.8
    2021-01-02  15.5    3.9 15.6    2.2", header=T)