Search code examples
rdata.tablezero

R how tocheck if a data table column is all zeros


I would like to remove the columns that have all zeros. But, some of the columns appear to have non numeric values. How can I remove the non numeric columns, and the columns with all zeros. It would be helpful if the non numeric column name was printed, or the column number, so I can determine if it was ok to remove the column.

Here's what I'm trying, but it doesn't work when the data table has non numeric values.

removeColsAllZeros = function(ddt) {
  m <- as.matrix(ddt)
  # isNumericColList <- lapply(1:ncol(m), function(ii,mm){is.numeric(mm[,ii])}, mm=m)
  # indexNonNumericCols <- which(!unlist(isNumericColList))
  mnz <- m[, colSums(abs(m),na.rm = TRUE) != 0]
  return(mnz)
} 

Solution

  • Both of the other answers were helpful, but they didn't totally answer the question. Here's a function with to identify and remove the non-numeric and all zero columns from a data table. This was helpful and provided additional insight into the data set.

    removeColsAllZeros = function(ddt) {
      # Identify and remove nonnumeric cols and cols with all zeros
      idx_all_zeros = ddt[, lapply(.SD, function(x){ (is.numeric(x) & all(x==0))  })]
      idx_not_numeric = ddt[, lapply(.SD, function(x){ (!is.numeric(x))  })]
      idx_all_zeros = which(unlist(idx_all_zeros))
      idx_not_numeric = which(unlist(idx_not_numeric))
      # Print bad column names
      if (length(idx_all_zeros)>0) {
        cat('Numeric columns with all zeros are\n',paste(names(ddt)[idx_all_zeros],collapse='\n'),'\n')
        flush.console()
      }
      if (length(idx_not_numeric)>0) {
        cat('Nonnumeric columns are\n',paste(names(ddt)[idx_not_numeric],collapse='\n'),'\n')
        flush.console()
      }
      # Determine the numeric columns that have nonzero values
      idx_bad = union(idx_all_zeros, idx_not_numeric)
      idx_good = setdiff(seq(1,ncol(ddt)), idx_bad)
      # Return nonzero numeric data
      ddt[, .SD, .SDcols = idx_good]
    }