Search code examples
rcsvcharacterscientific-notationreadr

reading in csv file with character type variable but expand scientific notation readr


I have a csv file with a variable (id). In excel when I check the format of the cells, some cells are type general and some are scientific:

#            id
# 1 ge189839898     #general format cell in excel
# 2   we7267178     #general format cell in excel
# 3     2.8E+12     #scientific format cell in excel

When I read the file into R using read_csv, it thinks that the column is character (which it is and what I want) but it means 2.8E+12 is also a character.

options(digits = 22, scipen = 9999)
library(tidyverse)
dfcsv <- read_csv("file.csv")
#where dfcsv looks like:
dfcsv <- data.frame(id = c("ge189839898",
                        "we7267178",
                        "2.8E+12"))
dfcsv
#            id
# 1 ge189839898     
# 2   we7267178    
# 3     2.8E+12  

Is there a way to automatically read in the csv so that variables with mixed types are correctly identified so it would be return a character variable but scientific notation is expanded:

#               id
# 1    ge189839898
# 2      we7267178
# 3  2800000000000

I don't think guess_max is what I am after here. I would also prefer not to use grep/sprintf type solutions (if possible) as I think that is trying to fix a problem I shouldn't have? I found these problematic ids by chance so I would like an automated way of doing this at the reading in stage.

The cleanest solution is probably to go in to the csv file and make a conversion there but I want to do it through R.

Thanks


Solution

  • id <- c("ge189839898", "we7267178", "2.8E+12")
    func <- function(x) {
      poss_num <- suppressWarnings(as.numeric(x))
      isna <- is.na(poss_num)
      x[!isna] <- format(poss_num[!isna], scientific = FALSE)
      x
    }
    func(id)
    # [1] "ge189839898"   "we7267178"     "2800000000000"