I have spent a few weeks looking for solutions to my problem of not only finding but creating a descriptive statistics summary table on my data that is exportable to xlsx (ideally). I have only found partial answers and my knowledge in R and R packages is still basic enough to limit my progress. My data set is time series data with 6 columns that have 50,000+ rows.
My DF information:
DateTime:POSIXCT format "YYYY-MM-DD HH:MM:SS"
Var1: num
Var2: num
Var3: num
Var4: factor w/ 2 levels "A","B"
Var5: factor w/ 4 levels "S1","S2","S3","S4"
My objectives are as follows:
Manipulate my data frame using tidyverse to subset my data
Take the subsetted data to create 1 summary table( i.e. in a tibble or data.frame format) with 2 sub factors (Var4 and/or Var5) for Var1, Var2, and Var3. Below is a simplified, visual example of the table I am aiming for:
write_xlsx(dataframe, path = "C:/Users/user/Desktop")
. Note for MacOS users, path = /Users/admin/yoursubfolder/yoursubfolder.... (fill in "yoursubfolder" with actual folder name on your computer)What I have done:
SEM <- function(x) sd(x)/sqrt(length(x))
Since I cannot attach sample data and my coding is very basic, here is what I could come up with:
Example data:
Unfortunately, I cannot attach sample data for testing. Also due to my limited knowledge of R, I cannot make a perfect data frame. Below is a sample data frame, but I cannot get the factor to be evenly distributed in their respective columns (Sorry). Here is my code:
df <- data.frame(
"DateTime" = seq(c(ISOdate(2018,03,01)), by = "day", length.out = 100),
"Var1" = rnorm(1:100),
"Var2" = rnorm(1:100),
"Var3" = rnorm(1:100),
"Var4" = c("A", "B"),
"Var5" = c("S1","S2", "S3", "S4"))
I was trying this:
"S1"[(1:25)],
"S2"[(26:50)],
"S3"[(51:75)],
"S4"[(76:100)] # and
"A"[(1:50], "B"[(51:100)] #but that didn't work, so sorry again.
Despite my lack of proper coding, any guidance, tips, and suggestions from anyone with more experience in R would be greatly appreciated as I do like R and all the capabilities of the software, but I find it very inconvenient that there is no simple, straightforward way to export tables in the console to copy and paste into useful forms like Excel spreadsheets or Word documents instead of standard exporting in LaTex format (which I do not understand at all btw). I know this topic has been discussed in different forums and others share my sentiment on how terrible it is especially for people who need it for data processing instead of document creations like Rmarkdown.
Some example with your df:
library(dplyr)
library(tidyr)
SEM_function <- function(x){sd(x)/sqrt(length(x))}
df %>% as_tibble() %>%
gather("Var_num", "value",Var1:Var3) %>%
group_by(Var_num, Var4,Var5) %>%
summarise("N" = n(),
"mean" = mean(value),
"StDev" = sd(value),
"SEM" = SEM_function(value) ,
"min" = min(value),
"max" = max(value))
Hope this helps