Search code examples
rsumsummaryrowsum

Is there a way in R to sumcolumns with different pattern of missing observations?


I have some variables that I wanted to add together but there are missing observations in some of them and when adding together, it will make the whole row with one or more missing as missing. For example, suppose I have the following with the last column as my expectation

df <- matrix(c(23,  NA, 56, NA, NA, 43, 67, NA, 11, 10, 18, 39), byrow = T, nrow = 3)
colnames(df)<- c("X",   "y",    "z",    "sum")
df
      X  y  z sum
[1,] 23 NA 56  NA
[2,] NA 43 67  NA
[3,] 11 10 18  39

Here is my expectation

df2 <- matrix(c(23, NA, 56, 79,
                 NA,    43, 67, 110,
                 11,    10, 18, 39), byrow = T, nrow = 3)

 colnames(df2)<- c("X", "Y", "Z", "sum")

 df2
      X  Y  Z sum
[1,] 23 NA 56  79
[2,] NA 43 67 110
[3,] 11 10 18  39

How can I get this result?

I am using R version 3.6 on Window 10.

Solution

  • As Ben pointed out I think all you want is na.rm = TRUE, so something like this:

    df <- matrix(c(23,  NA, 56, NA, 43, 67, 11, 10, 18), byrow = T, nrow = 3)
    colnames(df)<- c("X",   "y",    "z")
    cbind(df, summ = rowSums(df, na.rm = TRUE))
    #       X  y  z summ
    # [1,] 23 NA 56   79
    # [2,] NA 43 67  110
    # [3,] 11 10 18   39
    

    Or if you are working with a dataframe, something like this

        library(dplyr)
        df_frame <- data.frame(df)
        df_frame <- df_frame %>%
          mutate(summ = rowSums(., na.rm = TRUE))
        df_frame
        #    X  y  z summ
        # 1 23 NA 56   79
        # 2 NA 43 67  110
        # 3 11 10 18   39
    
    
    
    
    #OR this if you just want to select numeric variables from the dataframe:
    
        df_frame <- data.frame(df)
        df_frame <- df_frame %>%
          mutate(summ = rowSums(select_if(., is.numeric), na.rm = TRUE))
        df_frame
        #    X  y  z summ
        # 1 23 NA 56   79
        # 2 NA 43 67  110
        # 3 11 10 18   39