Search code examples
rexportsummary

Custom Descriptive Statistics Table Export


I have spent a few weeks looking for solutions to my problem of not only finding but creating a descriptive statistics summary table on my data that is exportable to xlsx (ideally). I have only found partial answers and my knowledge in R and R packages is still basic enough to limit my progress. My data set is time series data with 6 columns that have 50,000+ rows.

My DF information:

DateTime:POSIXCT format "YYYY-MM-DD HH:MM:SS"

Var1: num

Var2: num

Var3: num

Var4: factor w/ 2 levels "A","B"

Var5: factor w/ 4 levels "S1","S2","S3","S4"

My objectives are as follows:

  1. Manipulate my data frame using tidyverse to subset my data

  2. Take the subsetted data to create 1 summary table( i.e. in a tibble or data.frame format) with 2 sub factors (Var4 and/or Var5) for Var1, Var2, and Var3. Below is a simplified, visual example of the table I am aiming for:

ExampleTable

  1. Export the summary table (or summary tables if one table is not possible) to xlxs (ideally), .CSV, or .TXT to be used in Excel for stylistic table edits. At the moment, "writexl" package works very well for me as I have problems with "xlsx" and "openxlsx" packages. Here is the code needed to export to xlsx using the openxlsx package: write_xlsx(dataframe, path = "C:/Users/user/Desktop"). Note for MacOS users, path = /Users/admin/yoursubfolder/yoursubfolder.... (fill in "yoursubfolder" with actual folder name on your computer)

What I have done:

  • Used dplyr and the %>% function to manipulate the data without and with factor Var4 or Var5
  • Tried to create a summary table with Var4 as a factor for Var1,Var2, and Var3 (partial success; style is not what I want or it is not exportable to excel)
  • Looked in multiple StackOverflow questions and Google searches with no success to find code that works for my particular case. I've tried qwraps2 to create one and looking into the following packages for something pre-made: psych, stargazer, and HMSIC. I do not like their table styles and they do not all have the option to just show N, mean, StDev, SEM, Min, and Max.
  • I know SEM is not a standard function in most packages; thus, I borrowed this function from an answer on stack overflow because I do not know how to create functions. here is the borrowed code: SEM <- function(x) sd(x)/sqrt(length(x))

Since I cannot attach sample data and my coding is very basic, here is what I could come up with:

Example data:

Unfortunately, I cannot attach sample data for testing. Also due to my limited knowledge of R, I cannot make a perfect data frame. Below is a sample data frame, but I cannot get the factor to be evenly distributed in their respective columns (Sorry). Here is my code:

df <- data.frame(
"DateTime" = seq(c(ISOdate(2018,03,01)), by = "day", length.out = 100),
"Var1" = rnorm(1:100), 
"Var2" = rnorm(1:100), 
"Var3" = rnorm(1:100),
"Var4" = c("A", "B"),
"Var5" = c("S1","S2", "S3", "S4"))

I was trying this:

"S1"[(1:25)],
"S2"[(26:50)], 
"S3"[(51:75)], 
"S4"[(76:100)] # and 
"A"[(1:50], "B"[(51:100)] #but that didn't work, so sorry again.

Despite my lack of proper coding, any guidance, tips, and suggestions from anyone with more experience in R would be greatly appreciated as I do like R and all the capabilities of the software, but I find it very inconvenient that there is no simple, straightforward way to export tables in the console to copy and paste into useful forms like Excel spreadsheets or Word documents instead of standard exporting in LaTex format (which I do not understand at all btw). I know this topic has been discussed in different forums and others share my sentiment on how terrible it is especially for people who need it for data processing instead of document creations like Rmarkdown.


Solution

  • Some example with your df:

    library(dplyr)
    library(tidyr)
    
    SEM_function <- function(x){sd(x)/sqrt(length(x))}
    df %>% as_tibble() %>% 
      gather("Var_num", "value",Var1:Var3) %>% 
      group_by(Var_num, Var4,Var5) %>% 
      summarise("N" = n(), 
              "mean" = mean(value),
              "StDev" = sd(value),
              "SEM" = SEM_function(value) ,
              "min" = min(value),
              "max" = max(value))
    

    Hope this helps