Search code examples
rreshape2summarydplyr

Summarise a vector and then append the summary statistics to the original dataframe in R


Intro:

I would like to compute the mean, standard deviation, and standard error of a numeric vector in a given dataframe and then create three new vectors using these summary statistics. I then need to combine them with the original dataframe.

Example Code:

## Creating our dataframe:
datetime <- c("5/12/2017 16:15:00","5/16/2017 16:45:00","5/19/2017 17:00:00")
datetime <- as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S")
values <- c(1,2,3)
df <- data.frame(datetime, values)

## Here's the current output:
head(df)
             datetime values
1 2017-05-12 16:15:00      1
2 2017-05-16 16:45:00      2
3 2017-05-19 17:00:00      3

## And here's the desired output:
head(df1)
             datetime values mean    sd    se
1 2017-05-12 16:15:00      1    2 0.816 0.471
2 2017-05-16 16:45:00      2    2 0.816 0.471
3 2017-05-19 17:00:00      3    2 0.816 0.471

Thanks in advance!

For those who are curious as to why I am trying to do this, I am following this tutorial. I need to make one of those line graph plots with errorbars for some calibrations between a low-cost sensor and an expensive reference instrument.


Solution

  • You can do the assignment simultaneously. Suppose you already have the helper function for you choice of sd and se:

    sd0 <- function(x){sd(x) / sqrt(length(x)) * sqrt(length(x) - 1)}
    se0 <- function(x){ sd0(x) / sqrt(length(x))}
    

    Then you can try:

    df[c('mean', 'sd', 'se')] <- lapply(list(mean, sd0, se0), function(f) f(df$values))
    # > df
    #              datetime values mean        sd        se
    # 1 2017-05-12 16:15:00      1    2 0.8164966 0.4714045
    # 2 2017-05-16 16:45:00      2    2 0.8164966 0.4714045
    # 3 2017-05-19 17:00:00      3    2 0.8164966 0.4714045