Search code examples
rdplyrtidyversedata-processingsymmetric

Take Symmetrical Mean of a tibble (ignoring the NAs)


I have a tibble where the rows and columns are the same IDs and I would like to take the mean (ignoring the NAs) to make the df symmetrical. I am struggling to see how.

data <- tibble(group = LETTERS[1:4], 
               A = c(NA, 10, 20, NA),
               B = c(15, NA, 25, 30),
               C = c(20, NA, NA, 10),
               D = c(10, 12, 15, NA)
               )

I would normally do

A <- as.matrix(data[-1])
(A + t(A))/2

But this does not work because of the NAs.

Edit: below is the expected output.

output <- tibble(group = LETTERS[1:4],
                 A = c(NA, 12.5, 20, 10),
                 B = c(12.5, NA, 25, 21),
                 C = c(20, 25, NA, 12.5),
                 D = c(10, 21, 12.5, NA))

Solution

  • Okay so this is how I ended up doing this. I would have preferred if I didnt use a for loop because the actual data I have is much bigger but beggars cant be choosers!

    A <- as.matrix(data[-1])
    
    for (i in 1:nrow(A)){
      for (j in 1:ncol(A)){
        if(is.na(A[i,j])){
          A[i,j] <- A[j, i]
        }
      }
    }
    
    output <- (A + t(A))/2
    output %>% 
      as_tibble() %>% 
      mutate(group = data$group) %>% 
      select(group, everything())
    
    
    # A tibble: 4 x 5
      group     A     B     C     D
      <chr> <dbl> <dbl> <dbl> <dbl>
    1 A      NA    12.5  20    10  
    2 B      12.5  NA    25    21  
    3 C      20    25    NA    12.5
    4 D      10    21    12.5  NA