What is the most efficient way to clean currency data in R?

This is more of an open ended question since I'm not looking for a one particular code response. I have an example dataset, which is just a manually modified snippet of the set I am using. What I want to know is, for you more experienced R data analysts, how would you approach cleaning this data? Prioritising speed and efficiency (in terms of real time and computing time), what steps would you take and why, and which R packages are most useful? With enough googling and effort I have figured it out myself, but I'm certain my solutions are inelegant and inefficient. Do you have any little known tricks or functions that can fix multiple problems at once?

Imagine this is sales data pulled from a website with discounts and number of product ratings. Initially, every column is of character type: Note - I made it annoyingly messy on purpose

price	discounted_price	discount_percent	rating_count
USD499	USD176.63	65%	15,188
USD299	USD399	25%	34,899
19900 $	11,499 $	42	4703
$475	$309	35%	4,26,973
$ 22,900	$ 13490	41%	16,299
699.99USD	263.28USD	62	450
$899	$349	61%	149
USD1999	USD1,299	35	590

The clean table should look like this:

price	discounted_price	discount_percent	rating_count
$499.00	$176.63	65%	15,188
$299.00	$399.00	25%	34,899
$19,900.00	$11,499.00	42%	4,703
$475.00	$309.00	35%	426,973
$22,900.00	$13,490.00	41%	16,299
$699.99	$263.28	62%	450
$899.00	$349.00	61%	149
$1,999.00	$1,299.00	35%	590

Tasks to perform:

Make all price values in the format 0,000.00 with a comma separator for bigger values
Make all price values have a $ prefix
Make percentage column in the format xx%
Make rating_count in the format 0,000

Edit: Removed task to make all columns numeric as it didn't make sense

Solution

Discarding your "all columns numeric" constraint, since your expected output is all string, this is easily addressed with scales in a dplyr pipe:

library(dplyr)
library(scales) # dollar, percent, comma
quux %>%
  mutate(
    across(everything(), ~ as.numeric(gsub("[^-0-9.]", "", .))),
    discount_percent = discount_percent/100,
    across(c(price, discounted_price), ~ dollar(.)),
    discount_percent = percent(discount_percent, accuracy=1),
    rating_count = comma(rating_count)
  )
#        price discounted_price discount_percent rating_count
# 1    $499.00          $176.63              65%       15,188
# 2    $299.00          $399.00              25%       34,899
# 3 $19,900.00       $11,499.00              42%        4,703
# 4    $475.00          $309.00              35%      426,973
# 5 $22,900.00       $13,490.00              41%       16,299
# 6    $699.99          $263.28              62%          450
# 7    $899.00          $349.00              61%          149
# 8  $1,999.00        $1,299.00              35%          590

If you want to do any analysis or numeric processing, then separate this into two stages:

quux %>%
  mutate(
    across(everything(), ~ as.numeric(gsub("[^-0-9.]", "", .))),
    discount_percent = discount_percent/100
  ) %>%
  # do your analysis/calculations here
  mutate
    across(c(price, discounted_price), ~ dollar(.)),
    discount_percent = percent(discount_percent, accuracy=1),
    rating_count = comma(rating_count)
  )

If you have any other columns that might already be numeric, you may want to replace everything() with where(is.character). (You can use it anyway if you choose.)

This can also easily be done with base R or data.table. The scales package does make most of the formatting much easier.

You can generalize the formatting a bit if you have multiple columns not present in this example, using

quux %>%
  mutate(
    across(everything(), ~ as.numeric(gsub("[^-0-9.]", "", .))), 
    across(ends_with("percent"), ~ ./100)
  ) %>%
  # do your analysis/calculations here
  mutate(
    across(ends_with("price"), ~ dollar(.)), 
    across(ends_with("percent"), ~ percent(., accuracy=1)), 
    across(ends_with("count"), ~ comma(.))
  )

Data

quux <- structure(list(price = c("USD499", "USD299", "19900 $", "$475", "$ 22,900", "699.99USD", "$899", "USD1999"), discounted_price = c("USD176.63", "USD399", "11,499 $", "$309", "$ 13490", "263.28USD", "$349", "USD1,299"), discount_percent = c("65%", "25%", "42", "35%", "41%", "62", "61%", "35"), rating_count = c("15,188", "34,899", "4703", "4,26,973", "16,299", "450", "149", "590")), class = "data.frame", row.names = c(NA, -8L))