Search code examples
rdataframemeannastandard-deviation

R - Mean and standard deviation over dataframe with NAs based on criteria


Given the following dataframe:

df1 <- data.frame(Company = c('A','B','A','D','D','F'),
                  `X1980` = c(21,NA,53,57,11,26),
                  `X1981` = c(35,33,45,NA,NA,12),
                  `X1982` = c(3,7,10,23,9,53),
                  `X1984` = c(10,50,33,25,4,6))

I would first like to calculate the mean for each company, including all data available for each company. For example company D: (57 + 23 + 25 + 11 + 9 + 4)/6 = 21,5

I would also like to calculate the standard deviation for each company, using the same variables as for the mean.

Optimally this should result in the following data frame, with x standing in for the result:

 Company  mean   sd
    A      x     x
    B      x     x
    D      x     x
    F      x     x

At the moment I'm doing all the calculations manually by creating new data frames that build row sums and count the observations for each company. But this seems rather inelegant!

Any help would be greatly appreciated!


Solution

  • Try this. You can reshape to long keeping company and then summarise to obtain the expected output:

    library(tidyr)
    library(dplyr)
    #Code
    new <- df1 %>% pivot_longer(-1) %>%
      group_by(Company) %>%
      summarise(Mean=mean(value,na.rm = T),
                SD=sd(value,na.rm = T))
    

    Output:

    # A tibble: 4 x 3
      Company  Mean    SD
      <chr>   <dbl> <dbl>
    1 A        26.2  18.1
    2 B        30    21.7
    3 D        21.5  19.2
    4 F        24.2  20.9