Search code examples
rstringstringruppercase

`stringr` to convert first letter only to uppercase in dataframe


I would like to capitalize the first letter of each word in a column, without converting remaining letters to lowercase. I am trying to use stringr since its vectorized and plays well with dataframes, but would also use another solution. Below is a reprex showing my desired output and various attempts. I am able to select the first letter only, but then not sure how to capitalize it. Thank you for your help!

I also reviewed related posts, but wasn't sure how to apply those solutions in my case (i.e., within a dataframe):

First letter to upper case

Capitalize the first letter of both words in a two word string

library(dplyr)
library(stringr)

words <-
  tribble(
    ~word, ~number,
    "problems", 99,
    "Answer", 42,
    "golden ratio", 1.61,
    "NOTHING", 0
  )

# Desired output
new_words <-
  tribble(
    ~word, ~number,
    "Problems", 99,
    "Answer", 42,
    "Golden Ratio", 1.61,
    "NOTHING", 0
  )

# Converts first letter of each word to upper and all other to lower
mutate(words, word = str_to_title(word))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 Problems      99   
#> 2 Answer        42   
#> 3 Golden Ratio   1.61
#> 4 Nothing        0

# Some attempts
mutate(words, word = str_replace_all(word, "(?<=^|\\s)([a-zA-Z])", "X"))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 Xroblems      99   
#> 2 Xnswer        42   
#> 3 Xolden Xatio   1.61
#> 4 XOTHING        0
mutate(words, word = str_replace_all(word, "(?<=^|\\s)([a-zA-Z])", "\\1"))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 problems      99   
#> 2 Answer        42   
#> 3 golden ratio   1.61
#> 4 NOTHING        0

Created on 2021-07-26 by the reprex package (v2.0.0)


Solution

  • Here is a base R solution using gsub:

    words$word <- gsub("\\b([a-z])", "\\U\\1", words$word, perl=TRUE)
    

    This will replace the first lowercase letter of every word with its uppercase version. Note that the \b word boundary will match a lowercase preceded by either whitespace or the start of the column's value.