Search code examples
rsapply

Use a logical vector with sapply


I am trying to use a logical vector to 'tell' sapply which columns to make numeric in my dataset.

In my data, there are NAs but all the variables are either numeric or character. I'm taking the first complete case (hard code below, but would love suggestions!) and making a logical vector based on if the first character in the string is numeric or a letter. I would like to use that logical vector to tell sapply which columns to make numeric.

#make data frame, this should return an all 'character' data frame
color <- c("red", "blue", "yellow")
number <- c(NA, 1, 3)
other.number <- c(4, 5, 7)
df <- cbind(color, number, other.number) %>% as.data.frame()

#get the first character of the variables in the first complete case
temp <- sapply(df, function(x) substr(x, 1, 1)) %>% as.data.frame() %>%
  .[2,] %>% # hard code, this is the first 'complete case'
  gather() %>%
  #make the logical variable, which can be used as a vector
  mutate(vec= ifelse(value %in% letters, FALSE, TRUE)) # apply this vector to sapply + as.numeric to the df

Solution

  • This is a strange case, but If you need to convert numeric columns based on their first element, then an idea would be to convert it to numeric. Since any element that is not a number will return NA (as the warning states), you can use that to index. For example,

    ind <- sapply(na.omit(df), function(i) !is.na(as.numeric(i[1])))
    

    Warning message: In FUN(X[[i]], ...) : NAs introduced by coercion

    ind
    #       color       number other.number 
    #       FALSE         TRUE         TRUE 
    
    df[ind] <- lapply(df[ind], as.numeric)
    
    str(df)
    #'data.frame':  3 obs. of  3 variables:
    # $ color       : chr  "red" "blue" "yellow"
    # $ number      : num  NA 1 3
    # $ other.number: num  4 5 7
    

    DATA

    dput(df)
    structure(list(color = c("red", "blue", "yellow"), number = c(NA, 
    "1", "3"), other.number = c("4", "5", "7")), .Names = c("color", 
    "number", "other.number"), row.names = c(NA, -3L), class = "data.frame")