Search code examples
rreadxl

Problem with loading huge excel file(100mb) when using read_xlsx. Returns wrongly "TRUE" in some cells


I am working with a huge dataframe and had some problems loading it from the excel file. I could only load it using read_xlsx from the readxl package. However i have now realized that some of the cells contains "TRUE" instead of the real value from the excel file. How can it load the file wrongly and is there any solution to avoid this?


Solution

  • Following this advice solved the problem.

    JasonAizkalns: Hard to tell, but this may be caused from allowing read_xlsx to "guess" the column types. If you know the column type beforehand, it's always best to specify them with the col_types parameter. In this case, it may have guessed that column type was logical when really it's supposed to be something else (say, text or numeric)

    Cleaning the dataset from columns with none numeric values and then using x<-read_xlsx(filename, skip = 1, col_types = "numeric"). Hereafter i y<- read_xlsx(filename, skip = 1, col_types = "date") on the column containing dates. I used cbind(y,x) to complete the dataset with the none numeric column. It seems that read_xlsx misinterprets the columns with numeric values if there is a lot of values missing.