I am struggling to download an excel file and then loading it to R:
utils::download.file(
url = 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php',
destfile = 'C:/users/arthu/Desktop/fines.rar',
mode = "wb"
)
After unzipping and trying to load it into R:
utils::unzip(
zipfile = './fines.rar',
exdir = './ibama_data'
)
dados <- readxl::read_xls(
"./ibama_data/rel_areas_embargadas_0-65000_2020-12-10_080019.xls"),
skip = 6,
col_type = c(rep("guess", 13), "date", "guess", "date")
)
I get libxls error: Unable to open file
.
If I try to rename the file as .xlsx as follows, I get an evaluation error when reading it with readxl::read_excel
, saying unable to open file
file <- file.rename(
from = "./Desktop/ibama_data/rel_areas_embargadas_0-65000_2020-12-10_080019.xls",
to = "./Desktop/ibama_data/test.xlsx"
)
However, if I manually open such a file, excel throws me a warning saying that the file's extension does not match its type. After saving it as .xlsx, I can finally load it using read_excel
How can I solve this, given that I want to write a package with a function that downloads such data from the web and then loads it into R?
Edit
The .xls
file you are trying to read isn't an Excel document, it's an HTML table.
You could read it using XML
package :
library(XML)
doc <- htmlParse('rel_areas_embargadas_0-65000_2021-01-13_080018.xls')
tableNode <- getNodeSet(doc, '//table')
data <- XML::readHTMLTable(tableNode[[1]])
#Store header
header <- data[1:5,]
#Store colnames
colnames <- data[6,]
#Remove header
data <- data[-1:-6,]
#Set colnames
colnames(data)<-colnames
head(data)