Search code examples
rdataframenanreshape2

remove missing values from dataframe and reshape the dataframe with the non-missing ones


I have a dataframe that has missing values at each column, but at different rows. For simplicity, let's see the following dataframe (real dataframe is much more complex):

first_column <- c(1, 2, NA,NA)
second_column <- c(NA, NA, 4,9)

df <- data.frame(first_column, second_column)

and we get:

  first_column second_column
1            1            NA
2            2            NA
3           NA             4
4           NA             9

Now, I want to reshape the dataframe, after removing these missing values. I want the following:

  first_column second_column
1            1            4
2            2            9

Is there an automatic way to do it (real dataframe has dimensions 1800 x 33)?


Solution

  • We may have to reorder the column values so that the NAs are at the end and then use drop_na

    library(dplyr)
    library(tidyr)
    df %>%
       summarise(across(everything(), ~ .x[order(is.na(.x))])) %>% 
       drop_na()
    

    -output

       first_column second_column
    1            1             4
    2            2             9
    

    If there are unequal distribution of NAs in each column and wants to remove the row only if all the columns have NA at a particular row after ordering, use if_all/if_all in filter

    df %>%
       mutate(across(everything(), ~ .x[order(is.na(.x))])) %>%
       filter(if_any(everything(), complete.cases))
    

    -output

      first_column second_column
    1            1             4
    2            2             9