Search code examples
rweb-scrapingrselenium

RSelenium Disappearing Link / Overlay Pages


Hopefully I've got an interesting problem for you all. I'm trying to get information on German Abgeordnete (basically representatives/delegates) and they have a really nice webpage that is unfortunately a bit too interactive to get at with rvest. So I thought it was time to dip my toes in RSelenium.

The following code gets the home page up in the remote browser:

rD <- rsDriver(browser="firefox", port=4546L, verbose=F)
remDr <- rD[["client"]]
remDr$navigate('https://www.bundestag.de/abgeordnete')

Here's what that looks like: This is what the page looks like.

I was able to get the both the next page arrow and select the 12 people with the following code:=

next_el <- remDr$findElement('xpath', 
'//*[contains(concat( " ", @class, " " ), concat( " ", "slick-next", " " ))]')

people_el <- remDr$findElements(using = 'xpath',
                                      '//*[(@id = "modul-biografien")]//h3')

But then everything sort of falls apart. Clicking on any given person opens this overlay page that my selector really hates.

Looking for a workaround, I noticed that if I drag my mouse over the Abgeordnete's name (on that page pictured above) I get a nice clean link to a standalone page for that delegate. But that information seems nowhere present in anything RSelenium has access to.

Any tips? Feel free to suggest anything way out of left field.


Solution

  • You should be able to match, those links you see on mouseover, by combination of css class selectors with descendant combinator

    .bt-slide-content .bt-open-in-overlay
    

    Then from that returned nodeList extract the href attributes.

    Syntax, from looking at this and this should be something like

    webElems <- remDr$findElements(using = "css selector", ".bt-slide-content .bt-open-in-overlay")
    links <- unlist(lapply(webElems, function(x) {x$getElementAttribute('href')}))