Is there an R function to clean messy salaries in character format?

I have a column of messy salary data. I am wondering if there is a package that has a function made specifically for cleaning this type of messy data. My data looks like:

data.frame(salary = c("40,000-60,000", "40-80K", "$100,000", 
                  "$70/hr", "Between $65-80/hour", "$100k",
                  "50-60,000 a year", "90"))
#>                salary
#> 1       40,000-60,000
#> 2              40-80K
#> 3            $100,000
#> 4              $70/hr
#> 5 Between $65-80/hour
#> 6               $100k
#> 7    50-60,000 a year
#> 8                  90

^{Created on 2020-12-16 by the reprex package (v0.3.0)}

and I would like the clean column to be a numeric at the annual level. I know how to clean this column manually, I'm just wondering if there are any other packages that can help (other than readr::parse_number())

The expected output would look like:

#>   output
#> 1  50000
#> 2  60000
#> 3 100000
#> 4 145600
#> 5 150800
#> 6 100000
#> 7  55000
#> 8  90000

Solution

Here are some first steps you can try. I define two functions: one replaces a k or K with three zeros. The other adds leading zeros if one number is denoted in thousands and the other is not.

rem_k <- function(x) {
  sub("(\\d)[kK]", "\\1,000", x)
}

add_zero <- function(x) {
  ifelse(grepl("[1-9]0\\-\\d[0,]{2,}", x), sub("([1-9]0)(\\-\\d[0,]{2,})", "\\1,000\\2", x), x)
}

Finally, I remove all non essential characters:

df %>% 
  mutate(salary2 = gsub("[^0-9,\\-]", "", add_zero(rem_k(salary))))

               salary       salary2
1       40,000-60,000 40,000-60,000
2              40-80K 40,000-80,000
3            $100,000       100,000
4              $70/hr            70
5 Between $65-80/hour         65-80
6               $100k       100,000
7    50-60,000 a year 50,000-60,000
8                  90            90