Search code examples

Read table HTML in dropbox with XML package

I will try to read a table HTML in dropbox with XML package, but the XML::readHTMLTable function doesn’t work in html in dropbox and I don’t know why, someone could help me?

My code:



Open table html file in dropbox

FILE <- GET(url="") 

Read the table

tables <- getNodeSet(htmlParse(FILE), "//table") 
FE_tab <- readHTMLTable(tables[2], 
                    header = c("empresa","desc_projeto","desc_regiao", 
                    colClasses = c("character","character","character", 
                    trim = TRUE, stringsAsFactors = FALSE 
head(FE_tab) ### Doesn’t work


  • You can do it as follows:

    doc <- read_html("")
    FE_tab <- doc %>% html_table() %>% `[[`(1)

    Within your code you need to use ?dl=1 at the end of the URL. Otherwise you get the sourcecode of the dropbox page that displays if you open

    If you still want to use the XML package do:

    FILE <- GET(url="")
    tables <- getNodeSet(htmlParse(FILE), "//table") 
    FE_tab <- readHTMLTable(tables[[1]], 
                            header = c("empresa","desc_projeto","desc_regiao", 
                            colClasses = c("character","character","character", 
                            trim = TRUE, stringsAsFactors = FALSE 

    As tables is a list: use tables[[1]] and use 1 instead of 2 as there is only one list-element within tables.