I have a csv
file with a variable (id
). In excel
when I check the format of the cells, some cells are type general
and some are scientific
:
# id
# 1 ge189839898 #general format cell in excel
# 2 we7267178 #general format cell in excel
# 3 2.8E+12 #scientific format cell in excel
When I read the file into R
using read_csv
, it thinks that the column is character
(which it is and what I want) but it means 2.8E+12
is also a character.
options(digits = 22, scipen = 9999)
library(tidyverse)
dfcsv <- read_csv("file.csv")
#where dfcsv looks like:
dfcsv <- data.frame(id = c("ge189839898",
"we7267178",
"2.8E+12"))
dfcsv
# id
# 1 ge189839898
# 2 we7267178
# 3 2.8E+12
Is there a way to automatically read in the csv
so that variables with mixed types are correctly identified so it would be return a character
variable but scientific notation is expanded:
# id
# 1 ge189839898
# 2 we7267178
# 3 2800000000000
I don't think guess_max
is what I am after here. I would also prefer not to use grep
/sprintf
type solutions (if possible) as I think that is trying to fix a problem I shouldn't have? I found these problematic id
s by chance so I would like an automated way of doing this at the reading in stage.
The cleanest solution is probably to go in to the csv
file and make a conversion there but I want to do it through R
.
Thanks
id <- c("ge189839898", "we7267178", "2.8E+12")
func <- function(x) {
poss_num <- suppressWarnings(as.numeric(x))
isna <- is.na(poss_num)
x[!isna] <- format(poss_num[!isna], scientific = FALSE)
x
}
func(id)
# [1] "ge189839898" "we7267178" "2800000000000"