Search code examples
rrowna

R - Merging rows with numerous NA values to another column


I would like to ask the R community for help with finding a solution for my data, where any consecutive row with numerous NA values is combined and put into a new column.

For example:

df <- data.frame(A= c(1,2,3,4,5,6), B=c(2, "NA", "NA", 5, "NA","NA"), C=c(1,2,"NA",4,5,"NA"), D=c(3,"NA",5,"NA","NA","NA"))

  A  B  C  D
1 1  2  1  3
2 2 NA  2 NA
3 3 NA NA  5
4 4  5  4 NA
5 5 NA  5 NA
6 6 NA NA NA

Must be transformed to this:

  A  B  C  D  E
1 1  2  1  3  2 NA 2 NA 3 NA NA  5
2 4  5  4 NA  5 NA 5 NA 6 NA NA NA

I would like to do the following:

  1. Identify consecutive rows that have more than 1 NA value -> combine entries from those consecutive rows into a single combined entiry
  2. Place the above combined entry in new column "E" on the prior row

This is quite complex (for me!) and I am wondering if anyone can offer any help with this. I have searched for some similar problems, but have been unable to find one that produces a similar desired output.

Thank you very much for your thoughts--


Solution

  • Using tidyr and dplyr:

    1. Concatenate values for each row.
    2. Keep the concatenated values only for rows with more than one NA.
    3. Group each “good” row with all following “bad” rows.
    4. Use a grouped summarize() to concatenate “bad” row values to a single string.
    df %>%
      unite("E", everything(), remove = FALSE, sep = " ") %>%
      mutate(
        E = if_else(
          rowSums(across(!E, is.na)) > 1,
          E,
          ""
        ),
        new_row = cumsum(E == "")
      ) %>%
      group_by(new_row) %>%
      summarize(
        across(A:D, first),
        E = trimws(paste(E, collapse = " "))
      ) %>%
      select(!new_row)
    
    # A tibble: 2 × 5
          A     B     C     D E                   
      <dbl> <dbl> <dbl> <dbl> <chr>               
    1     1     2     1     3 2 NA 2 NA 3 NA NA 5 
    2     4     5     4    NA 5 NA 5 NA 6 NA NA NA