Search code examples
rrvestrselenium

How do I resolve "xml missing" with Rvest?


This is the web page where I'm trying to get the information I need: https://www.immobiliare.it/ricerca-mappa/Torino,TO/#/linkZona_/latitudine_45.04463/longitudine_7.68199/idContratto_1/idCategoria_23/zoom_16/pag_1

and this is the XPath associated to the node I'm intreseted:

//*[@id="box-listing"]/div[1]

While using

out %>%html_node(xpath = '//*[@id="box-listing"]/div[1]')

I ge the following error

{xml_missing}
<NA>

Solution

  • To solve your problem I suggest you to use Rselinium

    We have two big families of web sites. The static web site and the dynamic web site. The first one has the infomation that we need inside the code (for example Wikipidia web page), instead the second one doesn't have actually the information inside the code, but it makes it through the Javascript code every time we need it (for example Trip Advisor). Thanks to Rselenium library we are able to scrape information from a dynamic web site. What is Selenium? RSelenium is a R library, but we can find it in Python, Java and so in other types of code and it is able to emulate the human behaviour. The principal use of Selenium is to test the application automatization, but is not that case.

    Selenium is a very big world ( here to deep).

    About Rselenium I suggest you to check these links:

    GitHub repository

    Presentation

    Below a small example using Rselenium about your question:

        library(RSelenium)
        
        #We start the RSelenium environment
        driver <- rsDriver(browser=c("firefox"),port = 4445L)
        remote_driver <- driver[["client"]]  
        
        #We send the url to the firefox browser
        remote_driver$navigate("https://www.immobiliare.it/ricerca-mappa/Torino,TO/#/linkZona_/latitudine_45.04462/longitudine_7.68199/idContratto_1/idCategoria_23/zoom_16/pag_1")
        
        Below some example of the Rselenium powerful
        
        #We get the text
        text_1<-remote_driver$findElement(using = "css selector", '#box-listing > div:nth-child(1) > div:nth-child(1)')$getElementText()
        print(text_1)
        [[1]]
        [1] "PREMIUM\nImmobile\n€ 150.000\n60 m² • 2 locali"
    
        #We click the element
        remote_driver$findElement(using = "css selector", '#box-listing > div:nth-child(1) > div:nth-child(1)')$clickElement()