Search code examples
rr-xlsx

Import with xlsx package in R gives NA, <NA> and empty entries, can´t delete NA values


I am importing data from an xlsx (https://www.dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx) file with read.xlsx

setwd("C:/***//Kultivierungen//1707_ADH//")
pH <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="pH")
OD <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="OD")
Glc <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Glucose")
Ac <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Acetate")

I want to delete the NA Values with

OD <- OD[rowSums(is.na(OD))==0,]
Glc <- Glc[rowSums(is.na(Glc))==0,]
Ac <- Ac[rowSums(is.na(Ac))==0,]
pH <- pH[rowSums(is.na(pH))==0,]

..which works fine for the OD and pH data, but not for Ac and Glc. The result before deleting the NA Values looks like this:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
1  0.000000               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
2  1.502222               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

And after deleting the NA Values..:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

str() returns the following:

> str(Glc)
'data.frame':   9 obs. of  17 variables:
 $ time.in.h : num  0 1.5 3.69 10.25 16.25 ...
 $ SPL1      : Factor w/ 5 levels "","0.0602636534839925",..: NA NA 2 1 4 3 1 1 5
 $ SPL1_Error: Factor w/ 4 levels "","0.03","0.06",..: NA NA 3 1 3 2 1 1 4
 $ SPL2      : Factor w/ 5 levels "","0.0161127895266868",..: NA NA 3 1 4 2 1 1 5
 $ SPL2_Error: Factor w/ 5 levels "","0.09","0.12",..: NA NA 2 1 3 4 1 1 5

It has worked fine before with a different set of data/xlsx file, I tried to rule out all format-issues in the xlsx file as well, but couldn´t find anything....anyone had this before?


Solution

  • It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).

    When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx does import it as numeric vector instead of a factor and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,] to remove the rows that contain NA's.

    Can't tell you what exactly is going on here, but the above solution seems to work.