Search code examples
rmeanstandard-deviation

Mean and sd for a vector


I would like to calculate means and st.devs of a column in table but I would like to calculate them for each new observation ex

library(tidyverse)

aa <- data.frame(aa = c(2, 3, 4, 5, 6, 7, 8)) %>%
  mutate(aa1 = cumsum(aa), li = 1:n()) %>%
  mutate(MeanAA = aa1/li)


aa = c(2, 3, 4, 5, 6, 7, 8)

mean(aa[1:2])
mean(aa[1:3])

sd(aa[1:2])
sd(aa[1:3])

I could do it for a mean but not for SD. I would like to see how sd is changing in relation to mean with increasing number of observations.


Solution

  • How about this:

    aa <- c(2, 3, 4, 5, 6, 7, 8)
    
    for (i in 2:length(aa)) {
      mn <- mean(aa[1:i])
      ss <- sd(aa[1:i])
      cat(sprintf("1-%i\tMean: %.2f\tSD: %.2f\n", i, mn, ss))
    }
    #> 1-2  Mean: 2.50  SD: 0.71
    #> 1-3  Mean: 3.00  SD: 1.00
    #> 1-4  Mean: 3.50  SD: 1.29
    #> 1-5  Mean: 4.00  SD: 1.58
    #> 1-6  Mean: 4.50  SD: 1.87
    #> 1-7  Mean: 5.00  SD: 2.16
    

    Created on 2022-06-01 by the reprex package (v2.0.1)

    If you need the values in a data.frame, you can use it like so

    library(tidyverse)
    tibble(aa = c(2, 3, 4, 5, 6, 7, 8)) %>%
      mutate(
        running_mean = sapply(seq(n()), function(i) mean(aa[seq(i)])),
        running_sd = sapply(seq(n()), function(i) sd(aa[seq(i)])),
      )
    #> # A tibble: 7 x 3
    #>      aa running_mean running_sd
    #>   <dbl>        <dbl>      <dbl>
    #> 1     2          2       NA    
    #> 2     3          2.5      0.707
    #> 3     4          3        1    
    #> 4     5          3.5      1.29 
    #> 5     6          4        1.58 
    #> 6     7          4.5      1.87 
    #> 7     8          5        2.16
    

    Created on 2022-06-01 by the reprex package (v2.0.1)