Search code examples
rfunctionloopstextdata-cleaning

How to apply a function to a column(text) in a dataframe(R)?


There is a function to convert English numbers(one, two) to numeral(1, 2):

library(remotes)
remotes::install_github("fsingletonthorn/words_to_numbers")
library(wordstonumbers)

The input and output are:

input: words_to_numbers("one and threefold and five tennis")
output: "1 and threefold and 5 tennis"

It works well at once, my question is how to do the same operation for one column with over 1000 observations within a dataframe.

For dataframe called"data", the column needs to be convert is data$texts, I tried:

data <- within(data, {   
       texts_new <- words_to_numbers(texts) 
     })

Got: The argument which was passed to words_to_numbers is not a length 1 character element, only the first element has been used here. Consider using the apply or purrr::map functions to assess multiple elements at once.


Solution

  • Examples of how you could use it with sapply() or map_chr():

    library(wordstonumbers)
    df <- tibble::tibble(words = c("one and threefold and five tennis", 
                               "ninety-nine red balloons", 
                               "The answer is forty-two"))
    
    # sapply()
    df$as_numbers_lapply <- sapply(df$words, words_to_numbers)
    # or map_chr()
    df$as_numbers_map <- purrr::map_chr(df$words, words_to_numbers)
    
    df
    #> # A tibble: 3 × 3
    #>   words                             as_numbers_lapply            as_numbers_map 
    #>   <chr>                             <chr>                        <chr>          
    #> 1 one and threefold and five tennis 1 and threefold and 5 tennis 1 and threefol…
    #> 2 ninety-nine red balloons          99 red balloons              99 red balloons
    #> 3 The answer is forty-two           The answer is 42             The answer is …
    

    Created on 2022-11-15 with reprex v2.0.2