Search code examples
rdplyrtidyrmutate

Extract first Non NA value over multiple columns


I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.

I'm not sure if it's a loop that is supposed to be used for this, but basically I want to extract the first Non NA "X_No" Value if the "X_No" value is NA in the first row. This would perhaps be best described through an example from df to the desired df2.

A_ID <- c('A','B','I','N')
A_No <- c(11,NA,15,NA)
B_ID <- c('B','C','D','J')
B_No <- c(NA,NA,12,NA)
C_ID <- c('E','F','G','P')
C_No <- c(NA,13,14,20)
D_ID <- c('J','K','L','M')
D_No <- c(NA,NA,NA,40)
E_ID <- c('W','X','Y','Z')
E_No <- c(50,32,48,40)
df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)

ID <- c('A','D','F','M','W')
No <- c(11,12,13,40,50)
df2 <- data.frame(ID,No)


I'm hoping for an elegant solution to this as there are over a 1000 columns similar to the example provided. I've looked all over the web for a similar example however to no avail that would reproduce the expected result.

Your help is very much appreciated. Thankyou


Solution

  • I don't know if I'd call it "elegant", but here is a potential solution:

    library(tidyverse)
    
    A_ID <- c('A','B','I','N')
    A_No <- c(11,NA,15,NA)
    B_ID <- c('B','C','D','J')
    B_No <- c(NA,NA,12,NA)
    C_ID <- c('E','F','G','P')
    C_No <- c(NA,13,14,20)
    D_ID <- c('J','K','L','M')
    D_No <- c(NA,NA,NA,40)
    E_ID <- c('W','X','Y','Z')
    E_No <- c(50,32,48,40)
    df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
    
    ID <- c('A','D','F','M','W')
    No <- c(11,12,13,40,50)
    df2 <- data.frame(ID,No)
    
    output <- df %>%
      pivot_longer(everything(),
                   names_sep = "_",
                   names_to = c("Col", ".value")) %>%
      drop_na() %>%
      group_by(Col) %>%
      slice_head(n = 1) %>%
      ungroup() %>%
      select(-Col)
    
    df2
    #>   ID No
    #> 1  A 11
    #> 2  D 12
    #> 3  F 13
    #> 4  M 40
    #> 5  W 50
    
    output
    #> # A tibble: 5 × 2
    #>   ID       No
    #>   <chr> <dbl>
    #> 1 A        11
    #> 2 D        12
    #> 3 F        13
    #> 4 M        40
    #> 5 W        50
    
    all_equal(df2, output)
    #> [1] TRUE
    

    Created on 2023-02-08 with reprex v2.0.2