Search code examples
rdataframemultiple-columnssummary

R - Summary of non NAs for each row


I have a very similar question to this: How to identify which columns are not “NA” per row in a dataframe?

I only want to have the results in multiple columns and I need to have the actual values aswell. Its important that it also works when the values are not numeric, but characters! I have another df where there are stings instead of numbers!

     AFA  AFI  AII  AMA  AMI  AMU  BFA  BFI  BFU  BII
 1: 0.79   NA   NA 0.58   NA   NA   NA   NA 0.75   NA
 2:   NA   NA   NA   NA   NA 0.78   NA -0.5   NA   NA
 3:   NA   NA   NA   NA   NA   NA 0.79 -0.5   NA   NA
 4:   NA   NA   NA   NA   NA   NA   NA -0.5   NA   NA
 5:   NA   NA   NA   NA 0.63   NA   NA   NA   NA   NA
 6:   NA   NA   NA   NA   NA   NA 0.83   NA   NA   NA
 7: 0.63   NA   NA   NA   NA   NA   NA   NA   NA 0.82
 8:   NA   NA   NA   NA 0.63   NA   NA   NA   NA   NA
 9:   NA   NA 0.54 0.59   NA   NA   NA   NA   NA   NA
10:   NA 0.51   NA   NA   NA   NA   NA   NA   NA   NA

Output:

   V1_Code V1_Val V2_Code V2_Val V3_Code V3_Val
1:     AFA   0.79     AMA   0.58     BFU   0.75
2:     AMU   0.78     BFI   -0.5      NA     NA
3:     BFA   0.79     BFI   -0.5      NA     NA
       and so on....

This is my df:

    structure(list(AFA = c(0.79, NA, NA, NA, NA, NA, 0.63, NA, NA, 
    NA), AFI = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.51), AII = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, 0.54, NA), AMA = c(0.58, NA, NA, 
    NA, NA, NA, NA, NA, 0.59, NA), AMI = c(NA, NA, NA, NA, 0.63, 
    NA, NA, 0.63, NA, NA), AMU = c(NA, 0.78, NA, NA, NA, NA, NA, 
    NA, NA, NA), BFA = c(NA, NA, 0.79, NA, NA, 0.83, NA, NA, NA, 
    NA), BFI = c(NA, -0.5, -0.5, -0.5, NA, NA, NA, NA, NA, NA), BFU = c(0.75, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), BII = c(NA, NA, NA, NA, 
NA, NA, 0.82, NA, NA, NA)), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001fc6b791ef0>)

Solution

  • library(dplyr)
    library(tidyr)
    
    mydf |>
      mutate(id = row_number()) |> 
      pivot_longer(-id, names_to = "Code", values_to = "Val") |> 
      drop_na() |> 
      group_by(id) |> 
      mutate(col_num = row_number()) |> 
      ungroup() |> 
      pivot_wider(id_cols = id, values_from = c(Code, Val), names_from = col_num,
                  names_glue = "V{col_num}_{.value}") |> 
      select(-id) |> 
      relocate(sort(tidyselect::peek_vars()))
        
    # A tibble: 10 x 6
       V1_Code V1_Val V2_Code V2_Val V3_Code V3_Val
       <chr>    <dbl> <chr>    <dbl> <chr>    <dbl>
     1 AFA       0.79 AMA       0.58 BFU       0.75
     2 AMU       0.78 BFI      -0.5  NA       NA   
     3 BFA       0.79 BFI      -0.5  NA       NA   
     4 BFI      -0.5  NA       NA    NA       NA   
     5 AMI       0.63 NA       NA    NA       NA   
     6 BFA       0.83 NA       NA    NA       NA   
     7 AFA       0.63 BII       0.82 NA       NA   
     8 AMI       0.63 NA       NA    NA       NA   
     9 AII       0.54 AMA       0.59 NA       NA   
    10 AFI       0.51 NA       NA    NA       NA