Search code examples
rnamissing-data

Removing rows that contain NA values also removes all rows that contain values


all rows of United States got removed here. data_after_na.omit real data: data

Hi, having an issue where I want to omit rows that contain NA values, but once I use na.omit() the entire dataframe is gone. I think I am doing something wrong. I want to keep the values that exist within the Carbon_Footprint_percapita column, and delete the rows that show NA values.

my code is:

    us_df <- country_data %>%
      filter(country == "United States")
    
    us_df <- us_df %>%
      na.omit()

any help is appreciated. thank you!

I tried to use complete.cases() and drop_na() but one returns a list, and the other does the same as the above mentioned code. I don't know what I'm doing wrong.


Solution

  • You might have NA values in other columns. So, tidyr::drop_na will work fine in your case. You just need to pass column name in the code.

    See on this dummy data. First code works as desired while second one doesn't.

    library(tidyverse)
    
    df <- data.frame(
      id = 1:5,
      data1 = c(NA, 11:13, NA),
      data2 = c(100, NA, NA, NA, 115)
    )
    
    df
    #>   id data1 data2
    #> 1  1    NA   100
    #> 2  2    11    NA
    #> 3  3    12    NA
    #> 4  4    13    NA
    #> 5  5    NA   115
    
    df %>% 
      drop_na(data1)
    #>   id data1 data2
    #> 1  2    11    NA
    #> 2  3    12    NA
    #> 3  4    13    NA
    
    df %>% 
      na.omit(data1)
    #> [1] id    data1 data2
    #> <0 rows> (or 0-length row.names)
    

    Created on 2024-03-05 with reprex v2.0.2