R: Two Identically Structured Excel Files Return Different Data Types in Data Frames

I have two different Excel files, excel1 and excel2.

I am reading them in using separate but identical functions:

df1<- readxl::read_xlsx("excel1.xlsx", sheet= "Ad Awareness", skip= 7)
df2<- readxl::read_xlsx("excel2.xlsx", sheet= "Ad Awareness", skip= 7)

However, when I run head() on each, here is what df` returns:

calDate             Score
  <dttm>              <dbl>
1 2016-10-17 00:00:00  17.8
2 2016-10-18 00:00:00  17.2
3 2016-10-19 00:00:00  20.3

And here is what df2 returns:

  calDate Score
    <dbl> <lgl>
1   43025 NA   
2   43026 NA   
3   43027 NA

Any reason why the data type are being read-in different? There is nothing different about the files.

Solution

read_xlsx() will guess the variable types based on your data (see here for more information).

So what you are describing could be due to:

different amount of data in your different files (not enough data in one of them to get to a correct guess)
changes you might have made in Excel to the cell format (those changes are not always visually obvious in Excel)

Without seeing your data, it is hard to give you more answer than this.

But you can control this with the col_types argument:

col_types: Either ‘NULL’ to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one ‘col_type’ is specified, it will be recycled. The content of a cell in a skipped column is never read and that column will not appear in the data frame output. A list cell loads a column as a list of length 1 vectors, which are typed using the type guessing logic from ‘col_types = NULL’, but on a cell-by-cell basis.