I need to import a bunch of .csv files into R. I do this using the following code:
Dataset <- read.csv(paste0("./CSV/State_level/",file,".csv"),header = F,sep = ";",dec = "," , stringsAsFactors = FALSE)
The input is an .csv file with "," as separator for decimal places. Unfortunately there are quite a few entries as follows: 20,012,054. This should really be: 20012,054 and leads to either NAs but usually the whole df being imported as character and not numeric which I'd like to have.
How do I get rid of the first "," when looking from left to right and only if the number has more than 3 figuers infront of the decimal-comma?
Here is a sample of how the data looks in the .csv-file:
A data.frame might look like this:
df<-data.frame(a=c(0.5,0.84,12.25,"20,125,25"), b=c("1,111,054",0.57,105.25,0.15))
I used "." as decimal separator in this case to make it a number, which in the .csv is a ",", but this is not the issue for numbers in the format: 123,45.
Thank you for your ideas & help!
We can use sub
to get rid of the first ,
df[] <- lapply(df, function(x) sub(",(?=.*,)", "", x, perl = TRUE))
Just to show it would leave the ,
if there is only a single ,
in the code
sub(",(?=.*,)", "", c("0,5", "20,125,25"), perl = TRUE)
#[1] "0,5" "20125,25"