Search code examples
rcoercionreadxl

Detect a row that contain names


I need to detect if my first row of observations is a row of names. Always when I import the data are imported as character columns from the spreasheet (readxl package).

By the structure of the data, a non-name row, always contain at least 8 numeric values.

rowNoName <- c("23-234", "Bank of Wisdom", 1:8)
rowName <- c("code of acc", "name of acc", "ac", "li", "ui", "op", "o", "p", " e", "i")

So, in this logic, I use the implicit coercion to do my task. From a character element that is originally a numeric class element, the coercion is simple. But from an element that is originally a text string, the implicit coercion fails and throw a NA. The rule is:

testName <- function(row) {
if (sum(!is.na(as.numeric(row))) >= 8) {
  print("row without names")
} else {
  print("row with names")
}

This function solve the problem but exist another more formal way to do this? I mean, to avoid the warning message of the coercion in the output.

> testName(row)
[1] "row with names"
Warning message:
In testName(row) : NAs introduced by coercion

Solution

  • Test cases:

    rowNoName <- c("23-234", "Bank of Wisdom", 1:8)
    rowName <- c("code of acc", "name of acc", 
       "ac", "li", "ui", "op", "o", "p", " e", "i")
    

    Your approach:

    testName0 <- function(row) {
       sum(!is.na(as.numeric(row)))>=8
    }
    testName0(rowNoName)
    testName0(rowName)
    

    The simplest way to do this is to simply wrap the condition in suppressWarnings():

    testName1 <- function(row) {
       suppressWarnings(sum(!is.na(as.numeric(row)))>=8)
    }
    testName1(rowNoName) 
    testName1(rowName)
    

    suppressWarnings() suppresses all warnings, unfortunately, and as far as I know there is no simple way to filter on a particular warning: warnings in R do not have associated unique codes, and warning texts may be translated to other languages ...). For example, if for some crazy reason you ended up with row being set to a complex number, e.g. sum(!is.na(as.numeric(2+3i))) would give the warning "imaginary parts discarded in coercion", but this warning would be suppressed even though you probably would have wanted to see it.

    Therefore, an alternative approach, which more specifically detects what you're interested in, would be:

    testName2 <- function(row) {
      sum(grepl("^[0-9]+$",row)) >=8
    }
    testName2(rowNoName)
    testName2(rowName)
    

    This assumes by "numbers" you mean "integers". If you want to detect floating point numbers, you would need a different/more complex regular expression.

    More generally you might want to write these functions as testNamex <- function(row,min_nums=8) { ... }