I'm dealing with a huge Excel file (thousands of columns), and the researchers have used various ways to encodeNA, icluding -100. How would I import this? I've tried
library("openxlsx")
df <- read.xlsx("file.xlsx", sheet = 1, colNames = TRUE, detectDates=TRUE, skipEmptyRows=TRUE, na.strings=c("NA", "N/A", "-100", "-"))
However, -100 still appears as, well, -100, not NA.
This appears to be a bug in openxlsx::read.xlsx
. I created a small .xlsx
document with two columns:
Then tried reading it with read.xlsx
. The na.strings
argument doesn't seem to work very well. It omits the last row with two "N/A"
values (not desired) and keeps the "-99"
values as-is rather than replacing them with NA
as desired:
library(openxlsx)
read.xlsx("test.xlsx", na.strings = c("N/A", "-99"))
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3
# for comparison, without na.strings
read.xlsx("test.xlsx")
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3
# 4 N/A N/A
The readxl
package does much better:
library(readxl)
read_excel("test.xlsx", na = "-99")
# # A tibble: 4 x 2
# num char
# <dbl> <chr>
# 1 1 hello
# 2 NA NA
# 3 3 3
# 4 NA NA
This was using a freshly installed openxlsx
version 4.1.0, and readxl
version 1.2.0 (current version is 1.3.0).
The openxlsx
github page has an open issue regarding na.strings
. I added this example. You can track/comment on the issue here.