I will try to read a table HTML in dropbox with XML package, but the XML::readHTMLTable
function doesn’t work in html in dropbox and I don’t know why, someone could help me?
My code:
require(httr)
require(XML)
FILE <- GET(url="https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=0")
tables <- getNodeSet(htmlParse(FILE), "//table")
FE_tab <- readHTMLTable(tables[2],
header = c("empresa","desc_projeto","desc_regiao",
"cadastrador_por","cod_talhao","descricao",
"formiga_area","qtd_destruido","latitude",
"longitude","data_cadastro"),
colClasses = c("character","character","character",
"character","character","character",
"character","character","character",
"character","character"),
trim = TRUE, stringsAsFactors = FALSE
)
head(FE_tab) ### Doesn’t work
You can do it as follows:
require(rvest)
doc <- read_html("https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=1")
FE_tab <- doc %>% html_table() %>% `[[`(1)
Within your code you need to use ?dl=1
at the end of the URL. Otherwise you get the sourcecode of the dropbox page that displays if you open https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=0
If you still want to use the XML
package do:
FILE <- GET(url="https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=1")
tables <- getNodeSet(htmlParse(FILE), "//table")
FE_tab <- readHTMLTable(tables[[1]],
header = c("empresa","desc_projeto","desc_regiao",
"cadastrador_por","cod_talhao","descricao",
"formiga_area","qtd_destruido","latitude",
"longitude","data_cadastro"),
colClasses = c("character","character","character",
"character","character","character",
"character","character","character",
"character","character"),
trim = TRUE, stringsAsFactors = FALSE
)
head(FE_tab)
As tables
is a list: use tables[[1]]
and use 1 instead of 2 as there is only one list-element within tables.