Problem:
omit the NA values in a data.table random_dt
.
It can have NA values for any of entire row or column.
Requirement:
Display only those columns and rows which have non NA values including zeroes.
Dataset.
library(data.table)
set.seed(2)
num_rows <- sample(3:10, 1);num_cols <- sample(1:10, 1);colnames <- paste0("V", seq_len(num_cols));
random_dt <- as.data.table(matrix(runif(num_rows * num_cols), nrow = num_rows, ncol = num_cols))
na_cols <- sample(c(TRUE, FALSE), num_cols, replace = TRUE)
for (j in seq_along(na_cols)) { if (na_cols[j]) { random_dt[[j]] <- NA }}
na_rows <- sample(0:num_rows, 1)
if (na_rows > 0) { random_dt[(num_rows - na_rows + 1):num_rows, ] <- NA}
setnames(random_dt, colnames)
I have tried these:
> na.omit(random_dt[, lapply(.SD, function(x) replace(x, which(x==0), NA))])
Empty data.table (0 rows and 6 cols): V1,V2,V3,V4,V5,V6
> na.omit(random_dt)
Empty data.table (0 rows and 6 cols): V1,V2,V3,V4,V5,V6
I think you can try
> Filter(\(x) all(!is.na(x)), random_dt[rowMeans(is.na(random_dt)) < 1])
V2 V3 V4 V6
<num> <num> <num> <num>
1: 0.5526741 0.22582546 0.3472722 0.6672256
2: 0.2388948 0.44480923 0.4887732 0.1504698
3: 0.7605133 0.07497942 0.1492469 0.9817279
4: 0.1808201 0.66189876 0.3570626 0.2970107
5: 0.4052822 0.38754954 0.9626440 0.1150841
or
> random_dt[rowMeans(is.na(random_dt)) < 1, Filter(Negate(anyNA), .SD)]
V2 V3 V4 V6
<num> <num> <num> <num>
1: 0.5526741 0.22582546 0.3472722 0.6672256
2: 0.2388948 0.44480923 0.4887732 0.1504698
3: 0.7605133 0.07497942 0.1492469 0.9817279
4: 0.1808201 0.66189876 0.3570626 0.2970107
5: 0.4052822 0.38754954 0.9626440 0.1150841