Search code examples
rdataframedplyrmagrittr

Removing rows in a R data.table with NAs in specific columns


I have a data.table with a large number of features. I would like to remove the rows where the values are NAs only for certain features.

Currently I am using the following to handle this:

data.joined.sample <- data.joined.sample  %>% 
  filter(!is.na(lat))   %>% 
  filter(!is.na(long))   %>% 
  filter(!is.na(temp))   %>% 
  filter(!is.na(year))   %>% 
  filter(!is.na(month))   %>% 
  filter(!is.na(day))   %>% 
  filter(!is.na(hour))   %>% 
.......

Is there a more concise way to achieve this?

str(data.joined.sample)
Classes ‘data.table’ and 'data.frame':  336776 obs. of  50 variables:

Solution

  • We can select those columns, get a logical vector of NA's based on it using complete.cases and use that to remove the NA elements

    data.joined.sample[complete.cases(data.joined.sample[colsofinterest]),]
    

    where

    colsofinterest <- c("lat", "long", "temp", "year", "month", "day", "hour")
    

    Update

    Based on the OP's comments, if it is a data.table, then subset the colsofinterest and use complete.cases

    data.joined.sample[complete.cases(data.joined.sample[, colsofinterest, with = FALSE])]