Search code examples
rcsvformattingnumber-formattingread.csv

Remove comma which is a thousands separator in R


I need to import a bunch of .csv files into R. I do this using the following code:

Dataset <- read.csv(paste0("./CSV/State_level/",file,".csv"),header = F,sep = ";",dec = "," ,  stringsAsFactors = FALSE)

The input is an .csv file with "," as separator for decimal places. Unfortunately there are quite a few entries as follows: 20,012,054. This should really be: 20012,054 and leads to either NAs but usually the whole df being imported as character and not numeric which I'd like to have.

How do I get rid of the first "," when looking from left to right and only if the number has more than 3 figuers infront of the decimal-comma?

Here is a sample of how the data looks in the .csv-file: enter image description here

A data.frame might look like this:

df<-data.frame(a=c(0.5,0.84,12.25,"20,125,25"), b=c("1,111,054",0.57,105.25,0.15))

I used "." as decimal separator in this case to make it a number, which in the .csv is a ",", but this is not the issue for numbers in the format: 123,45.

Thank you for your ideas & help!


Solution

  • We can use sub to get rid of the first ,

    df[] <-  lapply(df, function(x) sub(",(?=.*,)", "", x, perl = TRUE))
    

    Just to show it would leave the , if there is only a single , in the code

    sub(",(?=.*,)", "", c("0,5", "20,125,25"), perl = TRUE)
    #[1] "0,5"      "20125,25"