Hopefully I've got an interesting problem for you all. I'm trying to get information on German Abgeordnete (basically representatives/delegates) and they have a really nice webpage that is unfortunately a bit too interactive to get at with rvest
. So I thought it was time to dip my toes in RSelenium
.
The following code gets the home page up in the remote browser:
rD <- rsDriver(browser="firefox", port=4546L, verbose=F)
remDr <- rD[["client"]]
remDr$navigate('https://www.bundestag.de/abgeordnete')
I was able to get the both the next page arrow and select the 12 people with the following code:=
next_el <- remDr$findElement('xpath',
'//*[contains(concat( " ", @class, " " ), concat( " ", "slick-next", " " ))]')
people_el <- remDr$findElements(using = 'xpath',
'//*[(@id = "modul-biografien")]//h3')
But then everything sort of falls apart. Clicking on any given person opens this overlay page that my selector really hates.
Looking for a workaround, I noticed that if I drag my mouse over the Abgeordnete's name (on that page pictured above) I get a nice clean link to a standalone page for that delegate. But that information seems nowhere present in anything RSelenium has access to.
Any tips? Feel free to suggest anything way out of left field.
You should be able to match, those links you see on mouseover, by combination of css class selectors with descendant combinator
.bt-slide-content .bt-open-in-overlay
Then from that returned nodeList extract the href attributes.
Syntax, from looking at this and this should be something like
webElems <- remDr$findElements(using = "css selector", ".bt-slide-content .bt-open-in-overlay")
links <- unlist(lapply(webElems, function(x) {x$getElementAttribute('href')}))