r row na

R - Merging rows with numerous NA values to another column

I would like to ask the R community for help with finding a solution for my data, where any consecutive row with numerous NA values is combined and put into a new column.

For example:

df <- data.frame(A= c(1,2,3,4,5,6), B=c(2, "NA", "NA", 5, "NA","NA"), C=c(1,2,"NA",4,5,"NA"), D=c(3,"NA",5,"NA","NA","NA"))

  A  B  C  D
1 1  2  1  3
2 2 NA  2 NA
3 3 NA NA  5
4 4  5  4 NA
5 5 NA  5 NA
6 6 NA NA NA

Must be transformed to this:

  A  B  C  D  E
1 1  2  1  3  2 NA 2 NA 3 NA NA  5
2 4  5  4 NA  5 NA 5 NA 6 NA NA NA

I would like to do the following:

Identify consecutive rows that have more than 1 NA value -> combine entries from those consecutive rows into a single combined entiry
Place the above combined entry in new column "E" on the prior row

This is quite complex (for me!) and I am wondering if anyone can offer any help with this. I have searched for some similar problems, but have been unable to find one that produces a similar desired output.

Thank you very much for your thoughts--

Solution

Using tidyr and dplyr:

Concatenate values for each row.
Keep the concatenated values only for rows with more than one NA.
Group each “good” row with all following “bad” rows.
Use a grouped summarize() to concatenate “bad” row values to a single string.

df %>%
  unite("E", everything(), remove = FALSE, sep = " ") %>%
  mutate(
    E = if_else(
      rowSums(across(!E, is.na)) > 1,
      E,
      ""
    ),
    new_row = cumsum(E == "")
  ) %>%
  group_by(new_row) %>%
  summarize(
    across(A:D, first),
    E = trimws(paste(E, collapse = " "))
  ) %>%
  select(!new_row)

# A tibble: 2 × 5
      A     B     C     D E                   
  <dbl> <dbl> <dbl> <dbl> <chr>               
1     1     2     1     3 2 NA 2 NA 3 NA NA 5 
2     4     5     4    NA 5 NA 5 NA 6 NA NA NA