I've imported a csv file to r. Few of my columns contains currency data (ex. starts with '$'). To remove the '$' and convert the into numeric type, I've used gsub() function and it works fine. Sample code is:
strip_dollars = function(x) { as.numeric(gsub("[\\$, ]", "", x)) }
lapply(data[ , c("x4", "x5", "x7", "x8")] , strip_dollars)
But in one particular currency column, I've few entries incuding parenthesis like ($2,345), $3,500, $5,600, ($3,234), ..... Hence, for this column the above code doesn't work. Please help me out with removing parenthesis & dollar sign from these entries.
Also, to identify columns containing currency data, I tried a lot including solutions given in R - identify which columns contain currency data $ but it doesn't work. Please suggest an optimal way of doing so.
I think the easiest way is to step through what is replaced. I'm inferring that you do not want to lose the negative-ness suggested by the parens, so we'll do two steps:
s <- c("($2,345)", "$3,500", "$5,600", "($3,234)")
gsub("[$),]", "", s)
# [1] "(2345" "3500" "5600" "(3234"
This removes most of the junk (that we did not want/need to keep), now let's deal with the leading left-paren:
gsub("^\\s*[(]", "-", gsub("[$),]", "", s))
# [1] "-2345" "3500" "5600" "-3234"
From here, if needed, you can convert to numeric
:
as.numeric(gsub("^\\s*[(]", "-", gsub("[$),]", "", s)))
# [1] -2345 3500 5600 -3234
This is not very robust to mal-formed strings. For instance, though I look for (and remove) leading space for the left-paren, anything else there will be a problem.