Search code examples
rdata.tablena

R: Omit NA values from a data.table


Problem:

omit the NA values in a data.table random_dt. It can have NA values for any of entire row or column.

Requirement:

Display only those columns and rows which have non NA values including zeroes.

Dataset.

library(data.table)
set.seed(2)
num_rows <- sample(3:10, 1);num_cols <- sample(1:10, 1);colnames <- paste0("V", seq_len(num_cols));
random_dt <- as.data.table(matrix(runif(num_rows * num_cols), nrow = num_rows, ncol = num_cols))
na_cols <- sample(c(TRUE, FALSE), num_cols, replace = TRUE)
for (j in seq_along(na_cols)) {  if (na_cols[j]) {    random_dt[[j]] <- NA  }}
na_rows <- sample(0:num_rows, 1)
if (na_rows > 0) {  random_dt[(num_rows - na_rows + 1):num_rows, ] <- NA}
setnames(random_dt, colnames)

I have tried these:

> na.omit(random_dt[, lapply(.SD, function(x) replace(x, which(x==0), NA))])
Empty data.table (0 rows and 6 cols): V1,V2,V3,V4,V5,V6
> na.omit(random_dt)
Empty data.table (0 rows and 6 cols): V1,V2,V3,V4,V5,V6

Solution

  • I think you can try

    > Filter(\(x) all(!is.na(x)), random_dt[rowMeans(is.na(random_dt)) < 1])
              V2         V3        V4        V6
           <num>      <num>     <num>     <num>
    1: 0.5526741 0.22582546 0.3472722 0.6672256
    2: 0.2388948 0.44480923 0.4887732 0.1504698
    3: 0.7605133 0.07497942 0.1492469 0.9817279
    4: 0.1808201 0.66189876 0.3570626 0.2970107
    5: 0.4052822 0.38754954 0.9626440 0.1150841
    

    or

    > random_dt[rowMeans(is.na(random_dt)) < 1, Filter(Negate(anyNA), .SD)]
              V2         V3        V4        V6
           <num>      <num>     <num>     <num>
    1: 0.5526741 0.22582546 0.3472722 0.6672256
    2: 0.2388948 0.44480923 0.4887732 0.1504698
    3: 0.7605133 0.07497942 0.1492469 0.9817279
    4: 0.1808201 0.66189876 0.3570626 0.2970107
    5: 0.4052822 0.38754954 0.9626440 0.1150841