I am reading a CSV into R, where several columns contain percentages that are formatted as text strings with a percentage symbol at the end, e.g. "35%". readr::read_csv() interprets these as character-type data, but I want the data to be numeric so I can perform analysis.
The following code achieves this, but seems like a lot of "hoops" to jump through. Is there a standard function (or option for a function) that does the same thing? There doesn't seem to be a relevant option in the read_csv() function.
convert_percentage_string <- function(percentage_string) {
percentage_string %>%
stringr::str_extract(., "[0-9]+") %>%
as.numeric()
}
read_csv("my_data.csv") %>%
mutate_at(columns_with_percentages, convert_percentage_string)
Sample data:
tribble(~name, ~count, ~percentage,
"Alice", 4, "40%",
"Bob", 10, "65%",
"Carol", 15, "15%")
Expected result:
tribble(~name, ~count, ~percentage,
"Alice", 4, 40,
"Bob", 10, 65,
"Carol", 15, 15)
Here's a dplyr
and readr
solution:
library(dplyr) # Version >= 1.0.0
library(readr)
library(stringr)
data %>%
mutate(across(where(~any(str_detect(.,"%"))), parse_number))
# A tibble: 3 x 3
name count percentage
<chr> <dbl> <dbl>
1 Alice 4 40
2 Bob 10 65
3 Carol 15 15
Feel free to replace any
with all
if you prefer.
A benefit of this approach is it detects columns that have the %
and only parses those. No need to know which columns need to be converted in advance.