Search code examples
rdataframedatatabletype-conversionsummary

Convert summary of data.frame into a dataframe


How can I convert the summary run on a data.frame into a data.frame itself? I need a data.frame as an output to knitr::kable in RMarkdown.

In particular I have this dataframe

d <- data.frame(a=c(1,2,3), b=c(4,5,6))
ds <- summary(d)
class(ds)
# returns "table"

And I need ds in a data.frame format.

The output I would like would be a data.frame with "Min.", "1st Qu.", "Median", etc. as row names, "a" and "b" as column names, and the numbers in the cells.

as.data.frame doesn't work:

ds.df <- as.data.frame(ds)
print(ds.df)
# Output is messed up

The code in this related question doesn't work either:

df.df2 <- data.frame(unclass(summary(ds.df)), check.names = FALSE, stringsAsFactors = FALSE)
print(df.df2)
# Output equally messed up

broom::tidy on a table is deprecated and in anyway returns an error:

df.df3 <- broom::tidy(ds)
# Returns error
# Error: Columns 1 and 2 must be named.
# Moreover
# 'tidy.table' is deprecated.

The as.data.frame.matrix puts "Min" and the other names of the statistics inside each cell, instead of them being row names:

ds.df3 <- as.data.frame.matrix(ds)
print(ds.df3)
# Returns "Min" and "1sd Qu." inside the cell
# instead of them being row names

Solution

  • We could use the matrix route

    out <- as.data.frame.matrix(ds)
    row.names(out) <- NULL
    

    -output

    out
                 a             b
    1 Min.   :1.0   Min.   :4.0  
    2 1st Qu.:1.5   1st Qu.:4.5  
    3 Median :2.0   Median :5.0  
    4 Mean   :2.0   Mean   :5.0  
    5 3rd Qu.:2.5   3rd Qu.:5.5  
    6 Max.   :3.0   Max.   :6.0  
    

    If we need the min etc as row names, loop over the columns with sapply and apply the summary

    as.data.frame(sapply(d, summary))
    

    -output

              a   b
    Min.    1.0 4.0
    1st Qu. 1.5 4.5
    Median  2.0 5.0
    Mean    2.0 5.0
    3rd Qu. 2.5 5.5
    Max.    3.0 6.0