Search code examples
rcurrencyread.csvsuffix

Parse currency values from CSV, convert numerical suffixes for Million and Billion


I'm curious if there's any sort of out of the box functions in R that can handle this.

I have a CSV file that I am reading into a data frame using read.csv. One of the columns in the CSV contains currency values in the format of

Currency
--------
$1.2M
$3.1B
N/A

I would like to convert those into more usable numbers that calculations can be performed against, so it would look like this:

Currency
----------
1200000
3100000000
NA

My initial thoughts were to somehow subset the dataframe into 3 parts based on rows that contain *M, *B, or N/A. Then use gsub to replace the $ and M/B, then multiply the remaining number by 1000000 or 1000000000, and finally rejoin the 3 subsets back into 1 data frame.

However I'm curious if there's a simpler way to handle this sort of conversion in R.


Solution

  • We could use gsubfn to replace the 'B', 'M' with 'e+9', 'e+6' and convert to numeric (as.numeric).

    is.na(v1) <- v1=='N/A'
    options(scipen=999)
    library(gsubfn)
    as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1)) 
    #[1]    1200000 3100000000         NA
    

    EDIT: Modified based on @nicola's suggestion

    data

    v1 <- c('$1.2M', '$3.1B', 'N/A')