Search code examples
rdataframeisnull

Fastest way to check if each row contains some value in dataframe in R?


I have a dataframe with around 300000 rows and 600 columns. I want to remove all rows which contain a sting "NULL" in at-least one of the columns. How would I achieve it without using for loops for rows and columns?

Will datatables work faster than normal dataframe in such case?


Solution

  • We can use rowSums (should be fast)

    df1[!rowSums(df1=='NULL'),]
    

    Or another option is any with apply

    df1[!apply(df1=='NULL', 1, any),]
    

    If we need a data.table solution,

    setDT(df1)[df1[,!Reduce(`+` , lapply(.SD, `==`, 'NULL'))]]
    

    If we are reading the NULL elements as NA using na.strings='NULL' in theread.table/read.csv` (as mentioned by @Roland)

     setDT(df1)[df1[, !Reduce(`+`,lapply(.SD, is.na))]]
    

    data

    set.seed(24)
    df1 <- data.frame(V1= sample(c(LETTERS[1:3],'NULL'), 20, replace=TRUE), 
            V2= sample(c(LETTERS[1:5], 'NULL'), 20, replace=TRUE),
           V3= sample(c(LETTERS[1:8], 'NULL'), 20, replace=TRUE), stringsAsFactors=FALSE)