Fastest way to check if each row contains some value in dataframe in R?

I have a dataframe with around 300000 rows and 600 columns. I want to remove all rows which contain a sting "NULL" in at-least one of the columns. How would I achieve it without using for loops for rows and columns?

Will datatables work faster than normal dataframe in such case?

Solution

We can use rowSums (should be fast)

df1[!rowSums(df1=='NULL'),]

Or another option is any with apply

df1[!apply(df1=='NULL', 1, any),]

If we need a data.table solution,

setDT(df1)[df1[,!Reduce(`+` , lapply(.SD, `==`, 'NULL'))]]

If we are reading the NULL elements as NA using na.strings='NULL' in theread.table/read.csv` (as mentioned by @Roland)

 setDT(df1)[df1[, !Reduce(`+`,lapply(.SD, is.na))]]

data

set.seed(24)
df1 <- data.frame(V1= sample(c(LETTERS[1:3],'NULL'), 20, replace=TRUE), 
        V2= sample(c(LETTERS[1:5], 'NULL'), 20, replace=TRUE),
       V3= sample(c(LETTERS[1:8], 'NULL'), 20, replace=TRUE), stringsAsFactors=FALSE)