when reading a csv file via fread
and using colClasses
to read the columns as numerics I am having trouble with data that consists of numbers with commas instead of dots. Since the data files have different origins, some use "." and some use "," as decimal separator
dt <- data.table(a=c("1,4","2,0","4,5","3,5","6,9"),c=(10:14))
write.csv(dt,"dt.csv",row.names=F)
dcsv <- fread("dt.csv", colClasses = list(numeric = 1:2), dec = ",").
I have 2 problems:
I want to read both columns as numerics. So I tried using dec = ","
. I now get an error: Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]
So I changed to colClasses = list(numeric = 1)
, but don't quite understand this.
Still the first column turns out to be character type instead of numeric.
How could I also change dec
to .
and ,
, since I don't know in advance what type of decimal separator any of the hundreds of files uses. I tried a vector, but did not work out. What am I missing? Thanks for any help!
It is not normal to have a file with 2 different types of numeric separator.
You should question the source of the file first thing.
Nevertheless, if you have such a file, the correct way to read it is with the variables with a comma separator as a string then to convert it to a numeric.
library(data.table)
dt <- data.table(a=c("1,4","2,0","4,5","3,5","6,9"),c=(10:14))
write.csv(dt,"dt.csv",row.names=F)
dcsv <- fread("dt.csv", dec = ".")
dcsv[, a:= as.numeric(gsub("\"", "", gsub(",", ".", a)))]
If you don't know if your variable is with a comma or a dot separator, you can loop over your variable to test if the variable is a string with only number and comma and convert only the ones fulfilling that condition.