Search code examples
rdata-manipulationdata-cleaningdplyr

Proper capitalization for all character columns


I feel like I'm missing something obvious, but I'm trying to capitalize every word in every column that is a character. I basically have a messy data set with names, addresses, and phone numbers, and I want to clean it so that the names and addresses are capitalized. Some names are totally lowercase, some all uppercase, some a mix.

This is what I've done (got the code for capitalizing from another question here), and I'm not sure why it's not working.

simpleCap <- function(x) { 
  s <- tolower(x) 
  s <- strsplit(s, " ")[[1]] 
  paste(toupper(substring(s, 1,1)), substring(s, 2), sep="", collapse=" ") 
} 

test <- test %>%
  mutate_if(function(.) is.character(.), sapply(., simpleCap))

The error I'm getting is: "Error in get(.x, .env, mode = "function") : object '[email protected]' of mode 'function' was not found"

EDIT: Here's an example of my data set:

test <- data.frame("name" = c("Ellie Golding", "angela smith", "JOHN DOE", "jake elSON"), 
                 "address" = c("123 magic lane", "321 MAGIC LANE", "200 magIC LANE", "99 Magic Lane"),
                 "phone" = c(123, 122, 111, 132))
test <- test %>%
    mutate(name = as.character(name), address = as.character(address), phone = as.numeric(phone))

Solution

  • Here's an approach with tools::toTitleCase which is a base package:

    library(dplyr) # Version >= 1.0.0
    library(purrr)
    test %>%
      mutate(across(.cols = which(map_lgl(.,~any(is.na(as.integer(as.character(.x)))))),
                    ~ tools::toTitleCase(tolower(.))))
    #           name        address phone
    #1 Ellie Golding 123 Magic Lane   123
    #2  Angela Smith 321 Magic Lane   122
    #3      John Doe 200 Magic Lane   111
    #4    Jake Elson  99 Magic Lane   132