I have a dataset and I need to create a simple table with the number of observations, means, and standard deviations of all the variables (columns). I can't find a way to get only the required 3 summary statistics. Everything I tried keeps giving me min, max, median, 1st and 3rd quartiles, etc. The table should look something like this (with a title):
Table 1: Table Title
_______________________________________
Variables Observations Mean Std.Dev
_______________________________________
Age 30 24 2
... . . .
... . . .
_______________________________________
The summary ()
does not work because it gives too many other summary statistics. I have done this:
sapply(dataset, function(x) list(means=mean(x,na.rm=TRUE), sds=sd(x,na.rm=TRUE)))
But how do I form the table from this? And is there a better way to do this than using "sapply"?
sapply
does return the values that you want but it is not properly structured.
Using mtcars
data as an example :
#Get the required statistics and convert the data into dataframe
summ_data <- data.frame(t(sapply(mtcars, function(x)
list(means = mean(x,na.rm=TRUE), sds = sd(x,na.rm=TRUE)))))
#Change rownames to new column
summ_data$variables <- rownames(summ_data)
#Remove rownames
rownames(summ_data) <- NULL
#Make variable column as 1st column
cbind(summ_data[ncol(summ_data)], summ_data[-ncol(summ_data)])
Another way would be using dplyr
functions :
library(dplyr)
mtcars %>%
summarise(across(.fns = list(means = mean, sds = sd),
.names = '{col}_{fn}')) %>%
tidyr::pivot_longer(cols = everything(),
names_to = c('variable', '.value'),
names_sep = '_')
# A tibble: 11 x 3
# variable means sds
# <chr> <dbl> <dbl>
# 1 mpg 20.1 6.03
# 2 cyl 6.19 1.79
# 3 disp 231. 124.
# 4 hp 147. 68.6
# 5 drat 3.60 0.535
# 6 wt 3.22 0.978
# 7 qsec 17.8 1.79
# 8 vs 0.438 0.504
# 9 am 0.406 0.499
#10 gear 3.69 0.738
#11 carb 2.81 1.62