Search code examples
rdataframemeanna

replace NAs in data frame with 'average if' of row


I have some data where each unique ID is a member of a group. There are some IDs with missing data, for these I'd like to take the average of the other members of the same group for that row.

For example, with the below data I'd like to replace the "NA" for id 3 in row V_2 with the average of the other Group A members for that row (average of 21 & 22). Similarly for id 7 in row V_3 it would be the average of 34 & 64.

Group=rep(c('A', 'B', 'C'), each=3)
id=1:9
V_1 = t(c(10,20,30,40,10,10,20,35,65))
V_2 = t(c(21,22,"NA",42,12,12,22,32,63))
V_3 = t(c(24,24,34,44,14,14,"NA",34,64))

df <- as.data.frame(rbind(Group, id, V_1, V_2, V_3))
df

Group  A  A  A  B  B  B  C  C  C
id     1  2  3  4  5  6  7  8  9
X     10 20 30 40 10 10 20 35 65
X.1   21 22 NA 42 12 12 22 32 63
X.2   24 24 34 44 14 14 NA 34 64

Solution

  • An approach using dplyr. The warnings occur because data frame columns are all character in your example (because the character class Group is in row 1). So ideally the whole data frame should be transposed...

    library(dplyr)
    library(tidyr)
    
    tibble(data.frame(t(df))) %>% 
      group_by(Group) %>% 
      mutate(across(X:X.2, ~ as.numeric(.x))) %>% 
      mutate(across(X:X.2, ~ replace_na(.x, mean(.x, na.rm=T)))) %>% 
      t() %>% 
      as.data.frame()
            V1   V2   V3   V4   V5   V6   V7   V8   V9
    Group    A    A    A    B    B    B    C    C    C
    id       1    2    3    4    5    6    7    8    9
    X       10   20   30   40   10   10   20   35   65
    X.1   21.0 22.0 21.5 42.0 12.0 12.0 22.0 32.0 63.0
    X.2     24   24   34   44   14   14   49   34   64
    Warning messages:
    1: Problem while computing `..1 = across(X:X.2, ~as.numeric(.x))`.
    ℹ NAs introduced by coercion
    ℹ The warning occurred in group 1: Group = "A". 
    2: Problem while computing `..1 = across(X:X.2, ~as.numeric(.x))`.
    ℹ NAs introduced by coercion
    ℹ The warning occurred in group 3: Group = "C".
    
    Same example using transposed data
    df_t %>% 
      group_by(Group) %>% 
      mutate(across(X:X.2, ~ replace_na(.x, mean(.x, na.rm=T)))) %>%
      ungroup()
    # A tibble: 9 × 5
      Group    id     X   X.1   X.2
      <chr> <dbl> <dbl> <dbl> <dbl>
    1 A         1    10  21      24
    2 A         2    20  22      24
    3 A         3    30  21.5    34
    4 B         4    40  42      44
    5 B         5    10  12      14
    6 B         6    10  12      14
    7 C         7    20  22      49
    8 C         8    35  32      34
    9 C         9    65  63      64
    

    with transpose back to wider format

    df_t %>% 
      group_by(Group) %>% 
      mutate(across(X:X.2, ~ replace_na(.x, mean(.x, na.rm=T)))) %>%
      t() %>%
      as.data.frame()
            V1   V2   V3   V4   V5   V6   V7   V8   V9
    Group    A    A    A    B    B    B    C    C    C
    id       1    2    3    4    5    6    7    8    9
    X       10   20   30   40   10   10   20   35   65
    X.1   21.0 22.0 21.5 42.0 12.0 12.0 22.0 32.0 63.0
    X.2     24   24   34   44   14   14   49   34   64
    

    transposed data

    df_t <- structure(list(Group = c("A", "A", "A", "B", "B", "B", "C", "C", 
    "C"), id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), X = c(10, 20, 30, 40, 
    10, 10, 20, 35, 65), X.1 = c(21, 22, NA, 42, 12, 12, 22, 32, 
    63), X.2 = c(24, 24, 34, 44, 14, 14, NA, 34, 64)), class = "data.frame", row.names = c("V1", 
    "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9"))