Search code examples
rdplyrpivotreshape2

pivot_longer the ouput of summarize_each with mean and sd functions


I am trying to summarize all columns in my dataset (which has many columns, although the rep example bellow only has 2), getting mean and sd for each variable. I want the output to be in long format.

#Example dataset
d <- iris %>% select(Sepal.Length,Sepal.Width)
names(d) <- c("SepalLength","SepalWidth")

#Summarizing and trying to make it long
s <- d %>% summarize_each( list(mean=mean,sd=sd) )  # has summar stats, but they are in wide format

# trying to pivot.
s %>% pivot_longer( ??? what do I put here ???)

I tried a few variations (ex: pivot_longer(names_to = "key", values_to = "value")) but always get an error message.


Solution

  • We can use select_helpers inside

    library(dplyr)
    library(tidyr)
    s %>%
       pivot_longer(everything())
    # A tibble: 4 x 2
    #  name             value
    #  <chr>            <dbl>
    #1 SepalLength_mean 5.84 
    #2 SepalWidth_mean  3.06 
    #3 SepalLength_sd   0.828
    #4 SepalWidth_sd    0.436
    

    Or if we need 'SepalLength', 'SepalWidth' as two columns

    s %>% 
       pivot_longer(cols = everything(), 
           names_to = c(".value", "statistic"), names_sep="_")
    # A tibble: 2 x 3
    #  statistic SepalLength SepalWidth
    #  <chr>           <dbl>      <dbl>
    #1 mean            5.84       3.06 
    #2 sd              0.828      0.436
    

    Or if we need the 'mean', 'sd' as two columns

    s %>%
        pivot_longer(cols = everything(), 
           names_to = c("colNames", ".value"), names_sep="_")
    # A tibble: 2 x 3
    #  colNames     mean    sd
    #  <chr>       <dbl> <dbl>
    #1 SepalLength  5.84 0.828
    #2 SepalWidth   3.06 0.436
    

    Or using gather

    s %>%
      gather