Apply multiple summary function across columns: summarise_all: 'list' object cannot be coerced to type 'double'

I am trying to create a descriptive statistics table, obtaining for each column of a data frame the following statistics: mean, sd, 10th, 50th, and 90th quantiles. I then want to transpose the dataset so that the columns are the different statistics, and each row is a variable in the dataset.

Here is a sample dataset:

dt <- data.frame(id = 1:100,
                 Numeric_Column_1 = rnorm(100),
                 Numeric_Column_2 = rnorm(100),
                 Numeric_Column_3 = rnorm(100),
                 Numeric_Column_4 = rnorm(100),
                 Numeric_Column_5 = rnorm(100))

and the code that should generate the table:


desc_table <- dt %>% select(-id)  %>%
  dplyr::summarise_all(.funs = list(mean=mean(.,na.rm=T),
                                    sd=sd(.,na.rm=T), 
                                    P10=~quantile(., c(0.1), na.rm=T),
                                    P50=~quantile(., c(0.5), na.rm=T),
                                    P90=~quantile(., c(0.9), na.rm=T)), 
                                    na.rm=TRUE) %>%
  pivot_longer(cols = everything()) %>%
  separate(name,c("Variable", "Stat"),sep = "_") %>%
  pivot_wider(names_from = "Stat", values_from = "value") %>%
  mutate(mean = round(mean, 2), sd= round(sd, 2))

However I get the following error:

Error in is.data.frame(x):
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In mean.default (., na.rm = T):
  argument is not numeric or logical: returning NA

How can I fix this?

Solution

Try this, modifying your code to fit the modern idiom and changing the separator in your <colname><separator><statistic> idiom from "_" to "." to avoid a clash with your column names (which might have been the source of your error)...

dt %>%
  dplyr::summarise(
    across(
      -id,
      list(
        mean = \(x) mean(x, na.rm = TRUE), 
        sd = \(x) sd(x, na.rm = TRUE), 
        P10 = \(x) quantile(x, 0.1, na.rm = TRUE), 
        P50 = \(x) quantile(x, 0.5, na.rm = TRUE), 
        P90 = \(x) quantile(x, 0.9, na.rm = TRUE)
      ),
      .names = "{.col}.{.fn}"
    ) 
  ) %>%
  pivot_longer(
    everything(), 
    names_sep = "\\.", 
    names_to = c("Variable", "Stat")
  ) %>%
  pivot_wider(names_from = "Stat", values_from = "value") %>%
  mutate(mean = round(mean, 2), sd= round(sd, 2))
# A tibble: 5 × 6
  Variable          mean    sd   P10     P50   P90
  <chr>            <dbl> <dbl> <dbl>   <dbl> <dbl>
1 Numeric_Column_1 -0.04  0.94 -1.20 -0.0872  1.11
2 Numeric_Column_2 -0.15  1.03 -1.46 -0.107   1.07
3 Numeric_Column_3  0.11  1.01 -1.53  0.229   1.14
4 Numeric_Column_4  0.09  1.05 -1.17  0.103   1.53
5 Numeric_Column_5 -0.02  1.02 -1.34 -0.0238  1.38

The use of .names in the across call removes the need for your separate step.

It might be better in the long run to drop the last element of the pipe and replace it with knitr::kable(digits = 2). This maintains the internal accuracy of your summary whilst formatting it as you request for display.

Also, see this page for an explanation of why you should use TRUE and FALSE rather than T and F.