Search code examples
rimport-from-excelr-xlsx

R read.xlsx colClasses issue


I am having a problem using the colClasses function in read.xlsx

I have the following data.frame

mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1")
head(mydata)
Treatment Nitrate_conc
1         1           12
2         1           12
3         1           15
4         1           16
5         1           12
6         2           18
str(mydata)
data.frame':    20 obs. of  2 variables:
$ Treatment   : num  1 1 1 1 1 2 2 2 2 2 ...
$ Nitrate_conc: num  12 12 15 16 12 18 25 26 28 28 ...

I want to import Treatment as a factor. to do this I have attempted to use the colClasses function as an argument as shown below:

mydata1 <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", colClasses = c("Treatment" = "factor", "Nitrate_conc" = "numeric"))

However I get the following error:

Error in class(aux) <- colClasses[ic] : adding class factor to an invalid object

Can anyone point out what I am doing wrong?


Solution

  • This is an old question, but it looks like it was never fully answered.

    This has nothing to do with whether or not the elements of the list for colClasses is named. The problem can be traced through the documentation ?read.xlsx . In describing the colClasses parameter, the documentation points to the documentation for readColumns. In the description there, it says

    Only numeric, character, Date, POSIXct, column types are accepted. Anything else will be coverted to a character type.

    So specifying 'factor' is not permitted. Also note that under ... it says

    other arguments to data.frame, for example stringsAsFactors

    So, we can use

    mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", 
      colClasses=c("character", "numeric"))
    str(mydata)
    'data.frame':   6 obs. of  2 variables:
     $ Treatment   : Factor w/ 2 levels "1","2": 1 1 1 1 1 2
     $ Nitrate_conc: num  12 12 15 16 12 18
    

    You can also use:

    mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", 
        colClasses=c(Treatment = "character", Nitrate_conc = "numeric"))
    

    It looks like there is just one parameter stringsAsFactors so it may not be possible to read both factors and strings at the same time. Of course, you can always convert a column to a factor after having read it as a different type.