I would like to remove the columns that have all zeros. But, some of the columns appear to have non numeric values. How can I remove the non numeric columns, and the columns with all zeros. It would be helpful if the non numeric column name was printed, or the column number, so I can determine if it was ok to remove the column.
Here's what I'm trying, but it doesn't work when the data table has non numeric values.
removeColsAllZeros = function(ddt) {
m <- as.matrix(ddt)
# isNumericColList <- lapply(1:ncol(m), function(ii,mm){is.numeric(mm[,ii])}, mm=m)
# indexNonNumericCols <- which(!unlist(isNumericColList))
mnz <- m[, colSums(abs(m),na.rm = TRUE) != 0]
return(mnz)
}
Both of the other answers were helpful, but they didn't totally answer the question. Here's a function with to identify and remove the non-numeric and all zero columns from a data table. This was helpful and provided additional insight into the data set.
removeColsAllZeros = function(ddt) {
# Identify and remove nonnumeric cols and cols with all zeros
idx_all_zeros = ddt[, lapply(.SD, function(x){ (is.numeric(x) & all(x==0)) })]
idx_not_numeric = ddt[, lapply(.SD, function(x){ (!is.numeric(x)) })]
idx_all_zeros = which(unlist(idx_all_zeros))
idx_not_numeric = which(unlist(idx_not_numeric))
# Print bad column names
if (length(idx_all_zeros)>0) {
cat('Numeric columns with all zeros are\n',paste(names(ddt)[idx_all_zeros],collapse='\n'),'\n')
flush.console()
}
if (length(idx_not_numeric)>0) {
cat('Nonnumeric columns are\n',paste(names(ddt)[idx_not_numeric],collapse='\n'),'\n')
flush.console()
}
# Determine the numeric columns that have nonzero values
idx_bad = union(idx_all_zeros, idx_not_numeric)
idx_good = setdiff(seq(1,ncol(ddt)), idx_bad)
# Return nonzero numeric data
ddt[, .SD, .SDcols = idx_good]
}